arxiv: 2508.08241 · v4 · submitted 2025-08-11 · 💻 cs.RO

Recognition: 3 theorem links

· Lean Theorem

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Qiayuan Liao , Takara E. Truong , Xiaoyu Huang , Yuman Gao , Guy Tevet , Koushil Sreenath , C. Karen Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 23:09 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid roboticsmotion trackinglatent diffusionclassifier guidanceagile locomotiontask generalizationzero-shot transfermotion synthesis

0 comments

The pith

A compact motion-tracking setup plus classifier-guided latent diffusion lets one humanoid policy master diverse agile skills and solve unseen tasks zero-shot.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a single compact motion-tracking formulation can learn many radically agile humanoid behaviors, such as aerial cartwheels, spin-kicks, and sprinting, using shared hyperparameters and without per-motion tuning. From this base it builds a unified latent diffusion model that accepts versatile goal inputs and, through classifier guidance at sampling time, composes or adapts the learned skills to handle entirely new tasks like motion inpainting, joystick teleoperation, and obstacle avoidance. The same model transfers these capabilities directly to real hardware. A reader would care because the approach removes the usual requirement for separate training runs or hand-crafted rewards when expanding what a humanoid can do.

Core claim

BeyondMimic establishes that a compact motion-tracking formulation masters a wide range of agile behaviors including aerial cartwheels, spin-kicks, flip-kicks, and sprinting under a single setup with shared hyperparameters while achieving state-of-the-art naturalness. A unified latent diffusion model then uses classifier guidance to enable goal specification, seamless task switching, and dynamic composition, allowing the system to solve downstream tasks never seen in training, such as motion inpainting, joystick teleoperation, and obstacle avoidance, with zero-shot transfer to real hardware.

What carries the argument

Unified latent diffusion model with classifier guidance, which steers generation toward novel objectives at test time while preserving the motions learned from the compact tracking stage.

If this is right

One fixed training setup suffices for many agile skills without motion-specific tuning.
Classifier guidance extends the model to tasks absent from training data.
Skills transfer zero-shot from simulation to physical humanoid hardware.
Behaviors can be composed and switched dynamically for complex sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may scale to longer-horizon tasks by chaining multiple guided diffusion steps under a high-level planner.
It could reduce reliance on hand-designed reward functions when training controllers for new environments.
Similar guidance techniques might apply to other robot morphologies once a base motion library exists.

Load-bearing premise

Classifier guidance during diffusion sampling can reliably steer outputs toward novel objectives while keeping motions natural and stable, without any task-specific retraining.

What would settle it

Running the guided diffusion on a new task such as obstacle avoidance and observing either collisions or visibly unnatural, unstable motions on the real robot.

read the original abstract

The human-like form of humanoid robots positions them uniquely to achieve the agility and versatility in motor skills that humans possess. Learning from human demonstrations offers a scalable approach to acquiring these capabilities. However, prior works either produce unnatural motions or rely on motion-specific tuning to achieve satisfactory naturalness. Furthermore, these methods are often motion- or goal-specific, lacking the versatility to compose diverse skills, especially when solving unseen tasks. We present BeyondMimic, a framework that scales to diverse motions and carries the versatility to compose them seamlessly in tackling unseen downstream tasks. At heart, a compact motion-tracking formulation enables mastering a wide range of radically agile behaviors, including aerial cartwheels, spin-kicks, flip-kicks, and sprinting, with a single setup and shared hyperparameters, all while achieving state-of-the-art human-like performance. Moving beyond the mere imitation of existing motions, we propose a unified latent diffusion model that empowers versatile goal specification, seamless task switching, and dynamic composition of these agile behaviors. Leveraging classifier guidance, a diffusion-specific technique for test-time optimization toward novel objectives, our model extends its capability to solve downstream tasks never encountered during training, including motion inpainting, joystick teleoperation, and obstacle avoidance, and transfers these skills zero-shot to real hardware. This work opens new frontiers for humanoid robots by pushing the limits of scalable human-like motor skill acquisition from human motion and advancing seamless motion synthesis that achieves generalization and versatility beyond training setups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BeyondMimic offers single-setup agile motion tracking plus guided diffusion for unseen tasks, but the abstract leaves the quantitative backing and tuning details unclear.

read the letter

The main takeaway is that this paper gets a compact motion tracker to handle lots of different agile humanoid moves—things like aerial cartwheels, spin-kicks, and sprinting—with one setup and the same hyperparameters across all of them. Then it layers on a latent diffusion model that uses classifier guidance to let the system tackle new tasks it never saw in training, like motion inpainting or obstacle avoidance, and even transfers to real hardware.

Referee Report

3 major / 1 minor

Summary. The paper introduces BeyondMimic, a framework for humanoid robot control that first uses a compact motion-tracking formulation to learn a wide range of agile behaviors (aerial cartwheels, spin-kicks, flip-kicks, sprinting) from human demonstrations under a single setup and shared hyperparameters, then employs a unified latent diffusion model with classifier guidance to enable versatile goal specification, seamless composition, and solution of unseen downstream tasks (motion inpainting, joystick teleoperation, obstacle avoidance) with zero-shot transfer to real hardware, claiming state-of-the-art human-like naturalness.

Significance. If the empirical claims hold with proper quantitative support, the work would be significant for demonstrating scalable, human-demonstration-driven acquisition of versatile agile skills in humanoids without per-motion or per-task retraining, advancing generalization beyond imitation to test-time guidance for novel objectives and real-world deployment.

major comments (3)

[Abstract] Abstract: the central claim of 'state-of-the-art human-like performance' and 'single setup and shared hyperparameters' for radically agile behaviors is not accompanied by any quantitative metrics, baseline comparisons, or ablation results in the manuscript description, leaving the performance advantage and hyperparameter invariance unverified.
[Abstract] Abstract and main claims: the assertion that a fixed classifier guidance scale (the only listed free parameter) steers sampling to unseen tasks while preserving naturalness and hardware stability without per-task rescaling or auxiliary losses is load-bearing for the 'no retraining, single setup' contribution, yet no evidence or sensitivity analysis is provided to rule out the need for task-specific tuning of this coefficient.
[Abstract] The zero-shot hardware transfer claim for downstream tasks is presented without reported failure-mode analysis, stability metrics on hardware, or comparison to task-specific baselines, which is necessary to substantiate that the diffusion model generalizes without post-hoc adjustments.

minor comments (1)

Notation for the latent diffusion model and classifier guidance could be clarified with explicit equations showing how the guidance term is added during sampling, as the current description leaves the precise formulation ambiguous.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to strengthen the presentation of our results and claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'state-of-the-art human-like performance' and 'single setup and shared hyperparameters' for radically agile behaviors is not accompanied by any quantitative metrics, baseline comparisons, or ablation results in the manuscript description, leaving the performance advantage and hyperparameter invariance unverified.

Authors: We agree that the abstract should explicitly reference supporting quantitative evidence. The full manuscript contains detailed quantitative evaluations, including success rates, naturalness metrics, baseline comparisons, and ablations confirming the single-setup performance across agile behaviors. In the revised version we will update the abstract to include key metrics and comparisons drawn from these experiments. revision: yes
Referee: [Abstract] Abstract and main claims: the assertion that a fixed classifier guidance scale (the only listed free parameter) steers sampling to unseen tasks while preserving naturalness and hardware stability without per-task rescaling or auxiliary losses is load-bearing for the 'no retraining, single setup' contribution, yet no evidence or sensitivity analysis is provided to rule out the need for task-specific tuning of this coefficient.

Authors: The experiments in the manuscript apply a single fixed guidance scale to multiple downstream tasks and report consistent success without per-task retuning. To directly address the request for evidence, we will add a sensitivity analysis section in the revision that varies the scale over a range and reports resulting task performance and stability metrics. revision: partial
Referee: [Abstract] The zero-shot hardware transfer claim for downstream tasks is presented without reported failure-mode analysis, stability metrics on hardware, or comparison to task-specific baselines, which is necessary to substantiate that the diffusion model generalizes without post-hoc adjustments.

Authors: We acknowledge that additional hardware-specific analysis would strengthen the zero-shot transfer claims. The manuscript reports successful real-world deployment, but we will expand the hardware section in revision to include failure-mode statistics, quantitative stability metrics, and available comparisons to task-specific baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent experimental validation

full rationale

The paper's core claims rest on training a compact motion-tracking policy and a latent diffusion model, followed by test-time classifier guidance for novel tasks. No derivation chain reduces any result to its inputs by construction; there are no equations presented that equate a 'prediction' to a fitted parameter or rename an input as an output. Versatility and zero-shot transfer are asserted via empirical results on agile motions and hardware transfer, which remain falsifiable outside the training distribution. Self-citations, if present, are not load-bearing for the central claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach inherits standard assumptions from diffusion models and humanoid dynamics simulators; the main added elements are the compact tracking loss and the choice of classifier guidance scale, both of which function as tunable hyperparameters.

free parameters (1)

classifier guidance scale
Controls the strength of test-time optimization toward novel objectives and must be chosen to balance task success against motion quality.

axioms (1)

domain assumption Human motion capture data contains sufficient coverage of agile behaviors for generalization via diffusion
The single-setup claim and zero-shot transfer rest on the assumption that the training demonstrations already span the behaviors needed for downstream tasks.

pith-pipeline@v0.9.0 · 5585 in / 1306 out tokens · 81274 ms · 2026-05-15T23:09:48.346184+00:00 · methodology

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting
cs.RO 2026-05 unverdicted novelty 7.0

ReActor jointly optimizes motion retargeting and RL policy training with an approximate gradient to generate physically consistent robot motions from human references using only sparse body correspondences.
TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos
cs.CV 2026-05 unverdicted novelty 7.0

TT4D delivers a large-scale dataset of high-fidelity 3D table tennis gameplay reconstructed from monocular videos using a novel lift-first pipeline that infers ball trajectories and spin while handling occlusions.
Physics-Informed Reinforcement Learning of Spatial Density Velocity Potentials for Map-Free Racing
cs.RO 2026-04 unverdicted novelty 7.0

A DRL policy learns racing controls from depth spectral distributions using a non-geometric physics-informed reward, achieving 12% better performance than humans on out-of-distribution tracks with under 1% of baseline...
Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids
cs.RO 2026-03 unverdicted novelty 7.0

Rhythm transfers interactive whole-body behaviors from simulation to real dual Unitree G1 humanoids via interaction-aware retargeting and graph-reward RL.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
cs.RO 2026-05 unverdicted novelty 6.0

VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
cs.RO 2026-05 unverdicted novelty 6.0

SixthSense infers whole-body contact events and wrenches in humanoids from proprioception and IMU data alone by tokenizing histories and estimating a sparse contact-event flow with conditional flow matching.
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
cs.RO 2026-04 unverdicted novelty 6.0

ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.
X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

X2-N is a transformable wheel-legged humanoid robot with a reinforcement learning whole-body controller that enables dual-mode locomotion and manipulation across varied terrains.
Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

Test-time steering of pre-trained whole-body policies via sample-based planning lets legged robots generalize dynamic loco-manipulation to varied heavy objects and tasks without additional training or tuning.
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

HEX is a new framework with humanoid-aligned state representation, mixture-of-experts proprioceptive predictor, history tokens, and residual-gated fusion that achieves state-of-the-art success and generalization on re...
RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild
cs.RO 2026-04 unverdicted novelty 6.0

RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
cs.LG 2026-04 unverdicted novelty 6.0

FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
cs.RO 2026-03 unverdicted novelty 6.0

NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
cs.RO 2026-02 unverdicted novelty 6.0

A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over...
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
cs.RO 2026-02 unverdicted novelty 6.0

HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
Switch: Learning Agile Skills Switching for Humanoid Robots
cs.RO 2026-04 unverdicted novelty 5.0

Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
Learning Versatile Humanoid Manipulation with Touch Dreaming
cs.RO 2026-04 conditional novelty 5.0

HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-r...
Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots
cs.RO 2026-04 unverdicted novelty 5.0

Tree Learning uses root-branch parameter inheritance and multi-modal adaptation to enable continual multi-skill learning in humanoid robots, achieving higher rewards and 100% retention versus joint training in Unity s...

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · cited by 19 Pith papers · 3 internal anchors

[1]

Kuindersma,et al., Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots40(3), 429–455 (2016)

S. Kuindersma,et al., Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots40(3), 429–455 (2016)

work page 2016
[2]

P. M. Wensing,et al., Optimization-based control for dynamic legged robots.IEEE Transactions on Robotics40, 43–63 (2023)

work page 2023
[3]

Kajita,et al., Biped walking pattern generation by using preview control of zero- moment point, in2003 IEEE international conference on robotics and automation (Cat

S. Kajita,et al., Biped walking pattern generation by using preview control of zero- moment point, in2003 IEEE international conference on robotics and automation (Cat. No. 03CH37422)(IEEE), vol. 2 (2003), pp. 1620–1626

work page 2003
[4]

Pratt, J

J. Pratt, J. Carff, S. Drakunov, A. Goswami, Capture point: A step toward humanoid push recovery, in2006 6th IEEE-RAS international conference on humanoid robots(Ieee) (2006), pp. 200–207

work page 2006
[5]

Deits, R

R. Deits, R. Tedrake, Footstep planning on uneven terrain with mixed-integer convex opti- mization, in2014 IEEE-RAS international conference on humanoid robots(IEEE) (2014), pp. 279–286

work page 2014
[6]

H. Dai, A. Valenzuela, R. Tedrake, Whole-body motion planning with centroidal dynamics and full kinematics, in2014 IEEE-RAS International Conference on Humanoid Robots(IEEE) (2014), pp. 295–302

work page 2014
[7]

Hereid, C

A. Hereid, C. M. Hubicki, E. A. Cousineau, J. W. Hurst, A. D. Ames, Hybrid zero dynamics based multiple shooting optimization with applications to robotic walking, in2015 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2015), pp. 5734–5740

work page 2015
[8]

J. Koenemann,et al., Whole-body model-predictive control applied to the HRP-2 humanoid, in2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE) (2015), pp. 3346–3351

work page 2015
[9]

Khatib, A unified approach for motion and force control of robot manipulators: The opera- tional space formulation.IEEE Journal on Robotics and Automation3(1), 43–53 (2003)

O. Khatib, A unified approach for motion and force control of robot manipulators: The opera- tional space formulation.IEEE Journal on Robotics and Automation3(1), 43–53 (2003). 32

work page 2003
[10]

Herzog,et al., Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid.Autonomous Robots40(3), 473–491 (2016)

A. Herzog,et al., Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid.Autonomous Robots40(3), 473–491 (2016)

work page 2016
[11]

P. M. Wensing, D. E. Orin, Generation of dynamic humanoid behaviors through task-space control with conic optimization, in2013 IEEE International Conference on Robotics and Automation(IEEE) (2013), pp. 3103–3109

work page 2013
[12]

Khazoom, D

C. Khazoom, D. Gonzalez-Diaz, Y. Ding, S. Kim, Humanoid self-collision avoidance us- ing whole-body control with control barrier functions, in2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)(IEEE) (2022), pp. 558–565

work page 2022
[13]

Z. Li, B. Vanderborght, N. G. Tsagarakis, D. G. Caldwell, Quasi-straightened knee walking for the humanoid robot, inModeling, Simulation and Optimization of Bipedal Walking(Springer), pp. 117–130 (2013)

work page 2013
[14]

Carpentier, R

J. Carpentier, R. Budhiraja, N. Mansard, Learning Feasibility Constraints for Multicontact Locomotion of Legged Robots., inRobotics: Science and Systems(Cambridge, MA) (2017), p. 9p

work page 2017
[15]

Fasano, J

S. Fasano, J. Foster, S. Bertrand, C. DeBuys, R. Griffin, Efficient, Dynamic Locomotion through Step Placement with Straight Legs and Rolling Contacts, in2024 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2024), pp. 1143–1150

work page 2024
[16]

P. M. Wensing, D. E. Orin, Improved computation of the humanoid centroidal dynamics and application for whole-body control.International Journal of Humanoid Robotics13(01), 1550039 (2016)

work page 2016
[17]

Y.-M. Chen, G. Nelson, R. Griffin, M. Posa, J. Pratt, Integrable whole-body orientation coordi- nates for legged robots, in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE) (2023), pp. 10440–10447

work page 2023
[18]

Chignoli, D

M. Chignoli, D. Kim, E. Stanger-Jones, S. Kim, The MIT humanoid robot: Design, motion planning, and control for acrobatic behaviors, in2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids)(IEEE) (2021), pp. 1–8. 33

work page 2021
[19]

Subburaman, N

R. Subburaman, N. G. Tsagarakis, J. Lee, Online rolling motion generation for humanoid falls based on active energy control concepts, in2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)(IEEE) (2018), pp. 1–7

work page 2018
[20]

Radosavovic,et al., Real-world humanoid locomotion with reinforcement learning.Science Robotics9(89), eadi9579 (2024)

I. Radosavovic,et al., Real-world humanoid locomotion with reinforcement learning.Science Robotics9(89), eadi9579 (2024)

work page 2024
[21]

Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654, 2024

I. Radosavovic, S. Kamat, T. Darrell, J. Malik, Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654(2024)

work page arXiv 2024
[22]

Liao,et al., Berkeley humanoid: A research platform for learning-based control, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp

Q. Liao,et al., Berkeley humanoid: A research platform for learning-based control, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 2897– 2904

work page 2025
[23]

Long,et al., Learning humanoid locomotion with perceptive internal model, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp

J. Long,et al., Learning humanoid locomotion with perceptive internal model, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 9997–10003

work page 2025
[24]

Siekmann, K

J. Siekmann, K. Green, J. Warila, A. Fern, J. Hurst, Blind Bipedal Stair Traversal via Sim-to- Real Reinforcement Learning.Robotics: Science and Systems XVII(2021)

work page 2021
[25]

Olkin, K

Z. Olkin, K. Li, W. D. Compton, A. D. Ames, Chasing Stability: Humanoid Running via Control Lyapunov Function Guided Reinforcement Learning.arXiv preprint arXiv:2509.19573(2025)

work page arXiv 2025
[26]

He,et al., Attention-based map encoding for learning generalized legged locomotion.Science Robotics10(105), eadv3604 (2025)

J. He,et al., Attention-based map encoding for learning generalized legged locomotion.Science Robotics10(105), eadv3604 (2025)

work page 2025
[27]

Wang,et al., Beamdojo: Learning agile humanoid locomotion on sparse footholds.arXiv preprint arXiv:2502.10363(2025)

H. Wang,et al., Beamdojo: Learning agile humanoid locomotion on sparse footholds.arXiv preprint arXiv:2502.10363(2025)

work page arXiv 2025
[28]

P. Zhi, P. Li, J. Yin, B. Jia, S. Huang, Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation, inConference on Robot Learning(PMLR) (2025), pp. 652–669

work page 2025
[29]

H. J. Lee, S. H. Jeon, S. Kim, Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning.IEEE Robotics and Automation Letters (2025). 34

work page 2025
[30]

Varin,Estimation and planning for dynamic robot behaviors(Harvard University) (2021)

P. Varin,Estimation and planning for dynamic robot behaviors(Harvard University) (2021)

work page 2021
[31]

Deits,et al., Robot movement and online trajectory optimization (2023), uS Patent 11,833,680

R. Deits,et al., Robot movement and online trajectory optimization (2023), uS Patent 11,833,680

work page 2023
[32]

X. B. Peng, P. Abbeel, S. Levine, M. van de Panne, DeepMimic: Example-guided Deep Reinforcement Learning of Physics-based Character Skills.ACM Trans. Graph.37(4), 143:1– 143:14 (2018), doi:10.1145/3197517.3201311,http://doi.acm.org/10.1145/3197517. 3201311

work page doi:10.1145/3197517.3201311 2018
[33]

Asap: Aligning simula- tion and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143, 2025

T. He,et al., Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143(2025)

work page arXiv 2025
[34]

Zhang,et al., HuB: Learning Extreme Humanoid Balance.arXiv preprint arXiv:2505.07294 (2025)

T. Zhang,et al., HuB: Learning Extreme Humanoid Balance.arXiv preprint arXiv:2505.07294 (2025)

work page arXiv 2025
[35]

Kungfubot: Physics-based humanoid whole- body control for learning highly-dynamic skills.arXiv preprint arXiv:2506.12851, 2025

W. Xie,et al., KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly- Dynamic Skills.arXiv preprint arXiv:2506.12851(2025)

work page arXiv 2025
[36]

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, A. Kanazawa, AMP: adversarial motion priors for stylized physics-based character control.ACM Trans. Graph.40(4) (2021), doi:10.1145/ 3450626.3459670,https://doi.org/10.1145/3450626.3459670

work page doi:10.1145/3450626.3459670 2021
[37]

Physhsi: Towards a real-world generalizable and natural humanoid-scene interaction system.arXiv preprint arXiv:2510.11072, 2025

H. Wang,et al., PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System.arXiv preprint arXiv:2510.11072(2025)

work page arXiv 2025
[38]

Ex- body2: Advanced expressive humanoid whole-body con- trol.arXiv preprint arXiv:2412.13196, 2024

M. Ji,et al., Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196(2024)

work page arXiv 2024
[39]

Karen Liu

Y. Ze,et al., TWIST: Teleoperated Whole-Body Imitation System.arXiv preprint arXiv:2505.02833(2025)

work page arXiv 2025
[40]

Li,et al., CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.arXiv preprint arXiv:2506.08931(2025)

Y. Li,et al., CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.arXiv preprint arXiv:2506.08931(2025)

work page arXiv 2025
[41]

Unitracker: Learn- ing universal whole-body motion tracker for humanoid robots.arXiv preprint arXiv:2507.07356, 2025

K. Yin,et al., UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots.arXiv preprint arXiv:2507.07356(2025). 35

work page arXiv 2025
[42]

He,et al., OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleop- eration and Learning, inConference on Robot Learning(PMLR) (2025), pp

T. He,et al., OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleop- eration and Learning, inConference on Robot Learning(PMLR) (2025), pp. 1516–1540

work page 2025
[43]

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, C. Finn, Humanplus: Humanoid shadowing and imitation from humans.arXiv preprint arXiv:2406.10454(2024)

work page arXiv 2024
[44]

M. Xu, Y. Shi, K. Yin, X. B. Peng, Parc: Physics-based augmentation with reinforcement learning for character controllers, inProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers(2025), pp. 1–11

work page 2025
[45]

Huang,et al., Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2025),https://arxiv.org/abs/2503.11801

X. Huang,et al., Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2025),https://arxiv.org/abs/2503.11801

work page arXiv 2025
[46]

Behav- ior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025

W. Zeng,et al., Behavior Foundation Model for Humanoid Robots.arXiv preprint arXiv:2509.13780(2025)

work page arXiv 2025
[47]

Langwbc: Language-directed humanoid whole-body control via end-to-end learning

Y. Shao,et al., LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning.arXiv preprint arXiv:2504.21738(2025)

work page arXiv 2025
[48]

Y. Wu, K. Karunratanakul, Z. Luo, S. Tang, UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control.arXiv preprint arXiv:2504.12540 (2025)

work page arXiv 2025
[49]

Song,et al., Score-Based Generative Modeling through Stochastic Differential Equations, inInternational Conference on Learning Representations(2021)

Y. Song,et al., Score-Based Generative Modeling through Stochastic Differential Equations, inInternational Conference on Learning Representations(2021)

work page 2021
[50]

Walker, J

C. Walker, J. Warmenhoven, P. J. Sinclair, S. Cobley, The application of inertial measurement units and functional principal component analysis to evaluate movement in the forward 31/2 pike somersault springboard dive.Sports Biomechanics18(2), 146–162 (2019)

work page 2019
[51]

Fukuchi, C

R. Fukuchi, C. Fukuchi, M. Duarte, A public data set of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics (2017), doi: 10.6084/m9.figshare.4543435.v5,https://figshare.com/articles/dataset/A_ comprehensive_public_data_set_of_running_biomechanics_and_the_effects_ of_running_speed_on_lower_extremity_kinematics_a...

work page doi:10.6084/m9.figshare.4543435.v5 2017
[52]

Horsak,et al., GaitRec, a large-scale ground reaction force dataset of healthy and impaired gait.Scientific data7(1), 143 (2020)

B. Horsak,et al., GaitRec, a large-scale ground reaction force dataset of healthy and impaired gait.Scientific data7(1), 143 (2020)

work page 2020
[53]

Zakka, B

K. Zakka, B. Yi, Q. Liao, L. Le Lay, MJLab: Isaac Lab API, powered by MuJoCo-Warp, for RL and robotics research. (2025),https://github.com/mujocolab/mjlab

work page 2025
[54]

(2025),https://github.com/unitreerobotics/unitree_rl_lab

Unitree, Unitree RL Lab: Reinforcement learning implementation for Unitree robots, based on IsaacLab. (2025),https://github.com/unitreerobotics/unitree_rl_lab

work page 2025
[55]

N. Ruiz,et al., Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, inProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition(2023), pp. 22500–22510

work page 2023
[56]

Brooks, A

T. Brooks, A. Holynski, A. A. Efros, Instructpix2pix: Learning to follow image editing instruc- tions, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition (2023), pp. 18392–18402

work page 2023
[57]

Hertz,et al., Prompt-to-Prompt Image Editing with Cross-Attention Control, inThe Eleventh International Conference on Learning Representations

A. Hertz,et al., Prompt-to-Prompt Image Editing with Cross-Attention Control, inThe Eleventh International Conference on Learning Representations

work page
[58]

Zhang, A

L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image diffusion models, inProceedings of the IEEE/CVF international conference on computer vision(2023), pp. 3836–3847

work page 2023
[59]

C. Mou,et al., T2i-adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models, inProceedings of the AAAI conference on artificial intelligence, vol. 38 (2024), pp. 4296–4304

work page 2024
[60]

Wang,et al., HDC: Humanoid Diffusion Controller,https:// humanoid-diffusion-controller.github.io/(2025), project website with draft manuscript

Y. Wang,et al., HDC: Humanoid Diffusion Controller,https:// humanoid-diffusion-controller.github.io/(2025), project website with draft manuscript

work page 2025
[61]

Huang,et al., DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets, in8th Annual Conference on Robot Learning(2024),https://openreview

X. Huang,et al., DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets, in8th Annual Conference on Robot Learning(2024),https://openreview. net/forum?id=nVJm2RdPDu. 37

work page 2024
[62]

Y. Zhou, C. Barnes, J. Lu, J. Yang, H. Li, On the continuity of rotation representations in neural networks, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2019), pp. 5745–5753

work page 2019
[63]

Z. Luo, J. Cao, A. Winkler, K. Kitani, W. Xu, Perpetual Humanoid Control for Real-time Simulated Avatars, in2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 10861–10870, doi:10.1109/ICCV51070.2023.01000

work page doi:10.1109/iccv51070.2023.01000 2023
[64]

J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models.Advances in neural infor- mation processing systems33, 6840–6851 (2020)

work page 2020
[65]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

A. Nichol,et al., Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[66]

Planning with Diffusion for Flexible Behavior Synthesis

M. Janner, Y. Du, J. B. Tenenbaum, S. Levine, Planning with diffusion for flexible behavior synthesis.arXiv preprint arXiv:2205.09991(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[67]

Karunratanakul, K

K. Karunratanakul, K. Preechakul, S. Suwajanakorn, S. Tang, Guided motion diffusion for con- trollable human motion synthesis, inProceedings of the IEEE/CVF International Conference on Computer Vision(2023), pp. 2151–2162

work page 2023
[68]

Cohan, G

S. Cohan, G. Tevet, D. Reda, X. B. Peng, M. van de Panne, Flexible motion in-betweening with diffusion models, inACM SIGGRAPH 2024 Conference Papers(2024), pp. 1–9

work page 2024
[69]

H. Xue, C. Pan, Z. Yi, G. Qu, G. Shi, Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 4974–4981

work page 2025
[70]

P. Roth, J. Frey, C. Cadena, M. Hutter, Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation.arXiv preprint arXiv:2504.19322(2025)

work page arXiv 2025
[71]

Welling, Y

M. Welling, Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, in Proceedings of the 28th international conference on machine learning (ICML-11)(2011), pp. 681–688. 38

work page 2011
[72]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion models, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2022), pp. 10684–10695

work page 2022
[73]

Kullback, R

S. Kullback, R. A. Leibler, On information and sufficiency.The annals of mathematical statistics 22(1), 79–86 (1951)

work page 1951
[74]

Luo,et al., Universal Humanoid Motion Representations for Physics-Based Control, inThe Twelfth International Conference on Learning Representations

Z. Luo,et al., Universal Humanoid Motion Representations for Physics-Based Control, inThe Twelfth International Conference on Learning Representations

work page
[75]

S. Ross, G. Gordon, D. Bagnell, A reduction of imitation learning and structured prediction to no-regret online learning, inProceedings of the fourteenth international conference on artificial intelligence and statistics(JMLR Workshop and Conference Proceedings) (2011), pp. 627–635

work page 2011
[76]

Hwangbo,et al., Learning agile and dynamic motor skills for legged robots.Science Robotics 4(26), eaau5872 (2019)

J. Hwangbo,et al., Learning agile and dynamic motor skills for legged robots.Science Robotics 4(26), eaau5872 (2019)

work page 2019
[77]

Mu- joco: A physics engine for model-based control

E. Todorov, T. Erez, Y. Tassa, MuJoCo: A physics engine for model-based control, in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems(2012), pp. 5026–5033, doi:10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[78]

Serifi, R

A. Serifi, R. Grandia, E. Knoop, M. Gross, M. B ¨acher, VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters.Computer Graphics Forum43(8), e15175 (2024), doi:https://doi.org/10.1111/cgf.15175

work page doi:10.1111/cgf.15175 2024
[79]

Liao, Dataset - BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion (2025), doi:10.5281/zenodo.17529720,https://doi.org/10.5281/ zenodo.17529720

Q. Liao, Dataset - BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion (2025), doi:10.5281/zenodo.17529720,https://doi.org/10.5281/ zenodo.17529720

work page doi:10.5281/zenodo.17529720 2025
[80]

V. B. Zordan, J. K. Hodgins, Motion capture-driven simulations that hit and react, inProceed- ings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation(2002), pp. 89–96. 39

work page 2002

Showing first 80 references.