Recognition: 3 theorem links
· Lean TheoremBeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion
Pith reviewed 2026-05-15 23:09 UTC · model grok-4.3
The pith
A compact motion-tracking setup plus classifier-guided latent diffusion lets one humanoid policy master diverse agile skills and solve unseen tasks zero-shot.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BeyondMimic establishes that a compact motion-tracking formulation masters a wide range of agile behaviors including aerial cartwheels, spin-kicks, flip-kicks, and sprinting under a single setup with shared hyperparameters while achieving state-of-the-art naturalness. A unified latent diffusion model then uses classifier guidance to enable goal specification, seamless task switching, and dynamic composition, allowing the system to solve downstream tasks never seen in training, such as motion inpainting, joystick teleoperation, and obstacle avoidance, with zero-shot transfer to real hardware.
What carries the argument
Unified latent diffusion model with classifier guidance, which steers generation toward novel objectives at test time while preserving the motions learned from the compact tracking stage.
If this is right
- One fixed training setup suffices for many agile skills without motion-specific tuning.
- Classifier guidance extends the model to tasks absent from training data.
- Skills transfer zero-shot from simulation to physical humanoid hardware.
- Behaviors can be composed and switched dynamically for complex sequences.
Where Pith is reading between the lines
- The method may scale to longer-horizon tasks by chaining multiple guided diffusion steps under a high-level planner.
- It could reduce reliance on hand-designed reward functions when training controllers for new environments.
- Similar guidance techniques might apply to other robot morphologies once a base motion library exists.
Load-bearing premise
Classifier guidance during diffusion sampling can reliably steer outputs toward novel objectives while keeping motions natural and stable, without any task-specific retraining.
What would settle it
Running the guided diffusion on a new task such as obstacle avoidance and observing either collisions or visibly unnatural, unstable motions on the real robot.
read the original abstract
The human-like form of humanoid robots positions them uniquely to achieve the agility and versatility in motor skills that humans possess. Learning from human demonstrations offers a scalable approach to acquiring these capabilities. However, prior works either produce unnatural motions or rely on motion-specific tuning to achieve satisfactory naturalness. Furthermore, these methods are often motion- or goal-specific, lacking the versatility to compose diverse skills, especially when solving unseen tasks. We present BeyondMimic, a framework that scales to diverse motions and carries the versatility to compose them seamlessly in tackling unseen downstream tasks. At heart, a compact motion-tracking formulation enables mastering a wide range of radically agile behaviors, including aerial cartwheels, spin-kicks, flip-kicks, and sprinting, with a single setup and shared hyperparameters, all while achieving state-of-the-art human-like performance. Moving beyond the mere imitation of existing motions, we propose a unified latent diffusion model that empowers versatile goal specification, seamless task switching, and dynamic composition of these agile behaviors. Leveraging classifier guidance, a diffusion-specific technique for test-time optimization toward novel objectives, our model extends its capability to solve downstream tasks never encountered during training, including motion inpainting, joystick teleoperation, and obstacle avoidance, and transfers these skills zero-shot to real hardware. This work opens new frontiers for humanoid robots by pushing the limits of scalable human-like motor skill acquisition from human motion and advancing seamless motion synthesis that achieves generalization and versatility beyond training setups.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BeyondMimic, a framework for humanoid robot control that first uses a compact motion-tracking formulation to learn a wide range of agile behaviors (aerial cartwheels, spin-kicks, flip-kicks, sprinting) from human demonstrations under a single setup and shared hyperparameters, then employs a unified latent diffusion model with classifier guidance to enable versatile goal specification, seamless composition, and solution of unseen downstream tasks (motion inpainting, joystick teleoperation, obstacle avoidance) with zero-shot transfer to real hardware, claiming state-of-the-art human-like naturalness.
Significance. If the empirical claims hold with proper quantitative support, the work would be significant for demonstrating scalable, human-demonstration-driven acquisition of versatile agile skills in humanoids without per-motion or per-task retraining, advancing generalization beyond imitation to test-time guidance for novel objectives and real-world deployment.
major comments (3)
- [Abstract] Abstract: the central claim of 'state-of-the-art human-like performance' and 'single setup and shared hyperparameters' for radically agile behaviors is not accompanied by any quantitative metrics, baseline comparisons, or ablation results in the manuscript description, leaving the performance advantage and hyperparameter invariance unverified.
- [Abstract] Abstract and main claims: the assertion that a fixed classifier guidance scale (the only listed free parameter) steers sampling to unseen tasks while preserving naturalness and hardware stability without per-task rescaling or auxiliary losses is load-bearing for the 'no retraining, single setup' contribution, yet no evidence or sensitivity analysis is provided to rule out the need for task-specific tuning of this coefficient.
- [Abstract] The zero-shot hardware transfer claim for downstream tasks is presented without reported failure-mode analysis, stability metrics on hardware, or comparison to task-specific baselines, which is necessary to substantiate that the diffusion model generalizes without post-hoc adjustments.
minor comments (1)
- Notation for the latent diffusion model and classifier guidance could be clarified with explicit equations showing how the guidance term is added during sampling, as the current description leaves the precise formulation ambiguous.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to strengthen the presentation of our results and claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of 'state-of-the-art human-like performance' and 'single setup and shared hyperparameters' for radically agile behaviors is not accompanied by any quantitative metrics, baseline comparisons, or ablation results in the manuscript description, leaving the performance advantage and hyperparameter invariance unverified.
Authors: We agree that the abstract should explicitly reference supporting quantitative evidence. The full manuscript contains detailed quantitative evaluations, including success rates, naturalness metrics, baseline comparisons, and ablations confirming the single-setup performance across agile behaviors. In the revised version we will update the abstract to include key metrics and comparisons drawn from these experiments. revision: yes
-
Referee: [Abstract] Abstract and main claims: the assertion that a fixed classifier guidance scale (the only listed free parameter) steers sampling to unseen tasks while preserving naturalness and hardware stability without per-task rescaling or auxiliary losses is load-bearing for the 'no retraining, single setup' contribution, yet no evidence or sensitivity analysis is provided to rule out the need for task-specific tuning of this coefficient.
Authors: The experiments in the manuscript apply a single fixed guidance scale to multiple downstream tasks and report consistent success without per-task retuning. To directly address the request for evidence, we will add a sensitivity analysis section in the revision that varies the scale over a range and reports resulting task performance and stability metrics. revision: partial
-
Referee: [Abstract] The zero-shot hardware transfer claim for downstream tasks is presented without reported failure-mode analysis, stability metrics on hardware, or comparison to task-specific baselines, which is necessary to substantiate that the diffusion model generalizes without post-hoc adjustments.
Authors: We acknowledge that additional hardware-specific analysis would strengthen the zero-shot transfer claims. The manuscript reports successful real-world deployment, but we will expand the hardware section in revision to include failure-mode statistics, quantitative stability metrics, and available comparisons to task-specific baselines. revision: yes
Circularity Check
No circularity: empirical framework with independent experimental validation
full rationale
The paper's core claims rest on training a compact motion-tracking policy and a latent diffusion model, followed by test-time classifier guidance for novel tasks. No derivation chain reduces any result to its inputs by construction; there are no equations presented that equate a 'prediction' to a fitted parameter or rename an input as an output. Versatility and zero-shot transfer are asserted via empirical results on agile motions and hardware transfer, which remain falsifiable outside the training distribution. Self-citations, if present, are not load-bearing for the central claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- classifier guidance scale
axioms (1)
- domain assumption Human motion capture data contains sufficient coverage of agile behaviors for generalization via diffusion
Forward citations
Cited by 19 Pith papers
-
ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting
ReActor jointly optimizes motion retargeting and RL policy training with an approximate gradient to generate physically consistent robot motions from human references using only sparse body correspondences.
-
TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos
TT4D delivers a large-scale dataset of high-fidelity 3D table tennis gameplay reconstructed from monocular videos using a novel lift-first pipeline that infers ball trajectories and spin while handling occlusions.
-
Physics-Informed Reinforcement Learning of Spatial Density Velocity Potentials for Map-Free Racing
A DRL policy learns racing controls from depth spectral distributions using a non-geometric physics-informed reward, achieving 12% better performance than humans on out-of-distribution tracks with under 1% of baseline...
-
Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids
Rhythm transfers interactive whole-body behaviors from simulation to real dual Unitree G1 humanoids via interaction-aware retargeting and graph-reward RL.
-
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
-
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.
-
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
SixthSense infers whole-body contact events and wrenches in humanoids from proprioception and IMU data alone by tokenizing histories and estimating a sparse contact-event flow with conditional flow matching.
-
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.
-
X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation
X2-N is a transformable wheel-legged humanoid robot with a reinforcement learning whole-body controller that enables dual-mode locomotion and manipulation across varied terrains.
-
Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation
Test-time steering of pre-trained whole-body policies via sample-based planning lets legged robots generalize dynamic loco-manipulation to varied heavy objects and tasks without additional training or tuning.
-
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation
HEX is a new framework with humanoid-aligned state representation, mixture-of-experts proprioceptive predictor, history tokens, and residual-gated fusion that achieves state-of-the-art success and generalization on re...
-
RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...
-
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over...
-
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
-
Switch: Learning Agile Skills Switching for Humanoid Robots
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
-
Learning Versatile Humanoid Manipulation with Touch Dreaming
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-r...
-
Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots
Tree Learning uses root-branch parameter inheritance and multi-modal adaptation to enable continual multi-skill learning in humanoid robots, achieving higher rewards and 100% retention versus joint training in Unity s...
Reference graph
Works this paper leans on
-
[1]
S. Kuindersma,et al., Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots40(3), 429–455 (2016)
work page 2016
-
[2]
P. M. Wensing,et al., Optimization-based control for dynamic legged robots.IEEE Transactions on Robotics40, 43–63 (2023)
work page 2023
-
[3]
S. Kajita,et al., Biped walking pattern generation by using preview control of zero- moment point, in2003 IEEE international conference on robotics and automation (Cat. No. 03CH37422)(IEEE), vol. 2 (2003), pp. 1620–1626
work page 2003
- [4]
- [5]
-
[6]
H. Dai, A. Valenzuela, R. Tedrake, Whole-body motion planning with centroidal dynamics and full kinematics, in2014 IEEE-RAS International Conference on Humanoid Robots(IEEE) (2014), pp. 295–302
work page 2014
- [7]
-
[8]
J. Koenemann,et al., Whole-body model-predictive control applied to the HRP-2 humanoid, in2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE) (2015), pp. 3346–3351
work page 2015
-
[9]
O. Khatib, A unified approach for motion and force control of robot manipulators: The opera- tional space formulation.IEEE Journal on Robotics and Automation3(1), 43–53 (2003). 32
work page 2003
-
[10]
A. Herzog,et al., Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid.Autonomous Robots40(3), 473–491 (2016)
work page 2016
-
[11]
P. M. Wensing, D. E. Orin, Generation of dynamic humanoid behaviors through task-space control with conic optimization, in2013 IEEE International Conference on Robotics and Automation(IEEE) (2013), pp. 3103–3109
work page 2013
-
[12]
C. Khazoom, D. Gonzalez-Diaz, Y. Ding, S. Kim, Humanoid self-collision avoidance us- ing whole-body control with control barrier functions, in2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)(IEEE) (2022), pp. 558–565
work page 2022
-
[13]
Z. Li, B. Vanderborght, N. G. Tsagarakis, D. G. Caldwell, Quasi-straightened knee walking for the humanoid robot, inModeling, Simulation and Optimization of Bipedal Walking(Springer), pp. 117–130 (2013)
work page 2013
-
[14]
J. Carpentier, R. Budhiraja, N. Mansard, Learning Feasibility Constraints for Multicontact Locomotion of Legged Robots., inRobotics: Science and Systems(Cambridge, MA) (2017), p. 9p
work page 2017
- [15]
-
[16]
P. M. Wensing, D. E. Orin, Improved computation of the humanoid centroidal dynamics and application for whole-body control.International Journal of Humanoid Robotics13(01), 1550039 (2016)
work page 2016
-
[17]
Y.-M. Chen, G. Nelson, R. Griffin, M. Posa, J. Pratt, Integrable whole-body orientation coordi- nates for legged robots, in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE) (2023), pp. 10440–10447
work page 2023
-
[18]
M. Chignoli, D. Kim, E. Stanger-Jones, S. Kim, The MIT humanoid robot: Design, motion planning, and control for acrobatic behaviors, in2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids)(IEEE) (2021), pp. 1–8. 33
work page 2021
-
[19]
R. Subburaman, N. G. Tsagarakis, J. Lee, Online rolling motion generation for humanoid falls based on active energy control concepts, in2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)(IEEE) (2018), pp. 1–7
work page 2018
-
[20]
I. Radosavovic,et al., Real-world humanoid locomotion with reinforcement learning.Science Robotics9(89), eadi9579 (2024)
work page 2024
-
[21]
Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654, 2024
I. Radosavovic, S. Kamat, T. Darrell, J. Malik, Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654(2024)
-
[22]
Q. Liao,et al., Berkeley humanoid: A research platform for learning-based control, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 2897– 2904
work page 2025
-
[23]
J. Long,et al., Learning humanoid locomotion with perceptive internal model, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 9997–10003
work page 2025
-
[24]
J. Siekmann, K. Green, J. Warila, A. Fern, J. Hurst, Blind Bipedal Stair Traversal via Sim-to- Real Reinforcement Learning.Robotics: Science and Systems XVII(2021)
work page 2021
- [25]
-
[26]
J. He,et al., Attention-based map encoding for learning generalized legged locomotion.Science Robotics10(105), eadv3604 (2025)
work page 2025
-
[27]
H. Wang,et al., Beamdojo: Learning agile humanoid locomotion on sparse footholds.arXiv preprint arXiv:2502.10363(2025)
-
[28]
P. Zhi, P. Li, J. Yin, B. Jia, S. Huang, Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation, inConference on Robot Learning(PMLR) (2025), pp. 652–669
work page 2025
-
[29]
H. J. Lee, S. H. Jeon, S. Kim, Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning.IEEE Robotics and Automation Letters (2025). 34
work page 2025
-
[30]
Varin,Estimation and planning for dynamic robot behaviors(Harvard University) (2021)
P. Varin,Estimation and planning for dynamic robot behaviors(Harvard University) (2021)
work page 2021
-
[31]
Deits,et al., Robot movement and online trajectory optimization (2023), uS Patent 11,833,680
R. Deits,et al., Robot movement and online trajectory optimization (2023), uS Patent 11,833,680
work page 2023
-
[32]
X. B. Peng, P. Abbeel, S. Levine, M. van de Panne, DeepMimic: Example-guided Deep Reinforcement Learning of Physics-based Character Skills.ACM Trans. Graph.37(4), 143:1– 143:14 (2018), doi:10.1145/3197517.3201311,http://doi.acm.org/10.1145/3197517. 3201311
-
[33]
T. He,et al., Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143(2025)
-
[34]
Zhang,et al., HuB: Learning Extreme Humanoid Balance.arXiv preprint arXiv:2505.07294 (2025)
T. Zhang,et al., HuB: Learning Extreme Humanoid Balance.arXiv preprint arXiv:2505.07294 (2025)
-
[35]
W. Xie,et al., KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly- Dynamic Skills.arXiv preprint arXiv:2506.12851(2025)
-
[36]
X. B. Peng, Z. Ma, P. Abbeel, S. Levine, A. Kanazawa, AMP: adversarial motion priors for stylized physics-based character control.ACM Trans. Graph.40(4) (2021), doi:10.1145/ 3450626.3459670,https://doi.org/10.1145/3450626.3459670
-
[37]
H. Wang,et al., PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System.arXiv preprint arXiv:2510.11072(2025)
-
[38]
Ex- body2: Advanced expressive humanoid whole-body con- trol.arXiv preprint arXiv:2412.13196, 2024
M. Ji,et al., Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196(2024)
- [39]
-
[40]
Y. Li,et al., CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.arXiv preprint arXiv:2506.08931(2025)
-
[41]
K. Yin,et al., UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots.arXiv preprint arXiv:2507.07356(2025). 35
-
[42]
T. He,et al., OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleop- eration and Learning, inConference on Robot Learning(PMLR) (2025), pp. 1516–1540
work page 2025
- [43]
-
[44]
M. Xu, Y. Shi, K. Yin, X. B. Peng, Parc: Physics-based augmentation with reinforcement learning for character controllers, inProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers(2025), pp. 1–11
work page 2025
-
[45]
X. Huang,et al., Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2025),https://arxiv.org/abs/2503.11801
-
[46]
Behav- ior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025
W. Zeng,et al., Behavior Foundation Model for Humanoid Robots.arXiv preprint arXiv:2509.13780(2025)
-
[47]
Langwbc: Language-directed humanoid whole-body control via end-to-end learning
Y. Shao,et al., LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning.arXiv preprint arXiv:2504.21738(2025)
- [48]
-
[49]
Y. Song,et al., Score-Based Generative Modeling through Stochastic Differential Equations, inInternational Conference on Learning Representations(2021)
work page 2021
- [50]
-
[51]
R. Fukuchi, C. Fukuchi, M. Duarte, A public data set of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics (2017), doi: 10.6084/m9.figshare.4543435.v5,https://figshare.com/articles/dataset/A_ comprehensive_public_data_set_of_running_biomechanics_and_the_effects_ of_running_speed_on_lower_extremity_kinematics_a...
-
[52]
B. Horsak,et al., GaitRec, a large-scale ground reaction force dataset of healthy and impaired gait.Scientific data7(1), 143 (2020)
work page 2020
- [53]
-
[54]
(2025),https://github.com/unitreerobotics/unitree_rl_lab
Unitree, Unitree RL Lab: Reinforcement learning implementation for Unitree robots, based on IsaacLab. (2025),https://github.com/unitreerobotics/unitree_rl_lab
work page 2025
-
[55]
N. Ruiz,et al., Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, inProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition(2023), pp. 22500–22510
work page 2023
- [56]
-
[57]
A. Hertz,et al., Prompt-to-Prompt Image Editing with Cross-Attention Control, inThe Eleventh International Conference on Learning Representations
- [58]
-
[59]
C. Mou,et al., T2i-adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models, inProceedings of the AAAI conference on artificial intelligence, vol. 38 (2024), pp. 4296–4304
work page 2024
-
[60]
Y. Wang,et al., HDC: Humanoid Diffusion Controller,https:// humanoid-diffusion-controller.github.io/(2025), project website with draft manuscript
work page 2025
-
[61]
X. Huang,et al., DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets, in8th Annual Conference on Robot Learning(2024),https://openreview. net/forum?id=nVJm2RdPDu. 37
work page 2024
-
[62]
Y. Zhou, C. Barnes, J. Lu, J. Yang, H. Li, On the continuity of rotation representations in neural networks, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2019), pp. 5745–5753
work page 2019
-
[63]
Z. Luo, J. Cao, A. Winkler, K. Kitani, W. Xu, Perpetual Humanoid Control for Real-time Simulated Avatars, in2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 10861–10870, doi:10.1109/ICCV51070.2023.01000
-
[64]
J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models.Advances in neural infor- mation processing systems33, 6840–6851 (2020)
work page 2020
-
[65]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
A. Nichol,et al., Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[66]
Planning with Diffusion for Flexible Behavior Synthesis
M. Janner, Y. Du, J. B. Tenenbaum, S. Levine, Planning with diffusion for flexible behavior synthesis.arXiv preprint arXiv:2205.09991(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[67]
K. Karunratanakul, K. Preechakul, S. Suwajanakorn, S. Tang, Guided motion diffusion for con- trollable human motion synthesis, inProceedings of the IEEE/CVF International Conference on Computer Vision(2023), pp. 2151–2162
work page 2023
- [68]
-
[69]
H. Xue, C. Pan, Z. Yi, G. Qu, G. Shi, Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing, in2025 IEEE International Conference on Robotics and Automation (ICRA)(IEEE) (2025), pp. 4974–4981
work page 2025
- [70]
-
[71]
M. Welling, Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, in Proceedings of the 28th international conference on machine learning (ICML-11)(2011), pp. 681–688. 38
work page 2011
-
[72]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion models, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2022), pp. 10684–10695
work page 2022
-
[73]
S. Kullback, R. A. Leibler, On information and sufficiency.The annals of mathematical statistics 22(1), 79–86 (1951)
work page 1951
-
[74]
Z. Luo,et al., Universal Humanoid Motion Representations for Physics-Based Control, inThe Twelfth International Conference on Learning Representations
-
[75]
S. Ross, G. Gordon, D. Bagnell, A reduction of imitation learning and structured prediction to no-regret online learning, inProceedings of the fourteenth international conference on artificial intelligence and statistics(JMLR Workshop and Conference Proceedings) (2011), pp. 627–635
work page 2011
-
[76]
J. Hwangbo,et al., Learning agile and dynamic motor skills for legged robots.Science Robotics 4(26), eaau5872 (2019)
work page 2019
-
[77]
Mu- joco: A physics engine for model-based control
E. Todorov, T. Erez, Y. Tassa, MuJoCo: A physics engine for model-based control, in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems(2012), pp. 5026–5033, doi:10.1109/IROS.2012.6386109
-
[78]
A. Serifi, R. Grandia, E. Knoop, M. Gross, M. B ¨acher, VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters.Computer Graphics Forum43(8), e15175 (2024), doi:https://doi.org/10.1111/cgf.15175
-
[79]
Q. Liao, Dataset - BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion (2025), doi:10.5281/zenodo.17529720,https://doi.org/10.5281/ zenodo.17529720
-
[80]
V. B. Zordan, J. K. Hodgins, Motion capture-driven simulations that hit and react, inProceed- ings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation(2002), pp. 89–96. 39
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.