Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation
Pith reviewed 2026-05-14 19:53 UTC · model grok-4.3
pith:BMFR3ZQI Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{BMFR3ZQI}
Prints a linked pith:BMFR3ZQI badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
CMC coordinates text and trajectory conditions via two-stage diffusion to generate accurate human motions
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By separating trajectory control into a simplified joint generation stage and using the output as partial observations for text-guided full motion inpainting, CMC resolves condition conflicts and representation inconsistencies, achieving state-of-the-art results in both trajectory following and motion realism.
What carries the argument
The divide-and-conquer cascade consisting of trajectory-guided diffusion for controlled joints and text-conditioned diffusion inpainting, with Selective Inpainting Mechanism (SIM) for training stability.
Load-bearing premise
The simplified controlled-joint representation supplies sufficient partial observations for generating consistent full-body motions without artifacts.
What would settle it
Observing frequent motion artifacts or trajectory deviations in the generated full-body motions when the first-stage output is used as input would disprove the effectiveness of the decoupling strategy.
Figures
read the original abstract
Trajectory-controlled human motion generation aims to synthesize realistic human motions conditioned on both textual descriptions and spatial trajectories. However, existing methods suffer from two critical limitations: first, the conflict between text and trajectory conditions disrupts the denoising process, resulting in compromised motion quality or inaccurate trajectory following; second, the use of redundant motion representations introduces inconsistencies between motion components, leading to instability during trajectory control. To address these challenges, we propose CMC, a decoupled framework that effectively coordinates text and trajectory conditions through a divide-and-conquer strategy. CMC follows a divide-and-conquer paradigm, comprising two cascaded stages: Trajectory Control and Motion Completion. In the first stage, a diffusion model generates a simplified representation of the controlled joints under trajectory guidance, based on the given trajectories, ensuring accurate and stable trajectory following. In the second stage, a text-conditioned diffusion inpainting model generates full-body motions using the simplified representation from the first stage as partial observations. To mitigate overfitting caused by limited inpainting training data, we further introduce the Selective Inpainting Mechanism (SIM), which alternates between text-to-motion generation and motion inpainting tasks during training. Experiments on HumanML3D and KIT datasets demonstrate that CMC achieves state-of-the-art performance in control accuracy and motion quality, demonstrating its effectiveness in coordinating multimodal conditions and representations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CMC, a two-stage decoupled diffusion framework for trajectory-controlled human motion generation. Stage 1 trains a diffusion model to produce a simplified representation of controlled joints conditioned on input trajectories. Stage 2 uses a text-conditioned inpainting diffusion model that treats the Stage-1 output as partial observations to synthesize full-body motion, with the Selective Inpainting Mechanism (SIM) alternating between text-to-motion and inpainting tasks to reduce overfitting. Experiments on HumanML3D and KIT are reported to show state-of-the-art control accuracy and motion quality.
Significance. If the two-stage decomposition and SIM prove robust, the work would offer a practical way to resolve conflicts between textual and spatial conditions in motion synthesis while avoiding inconsistencies from redundant representations. The divide-and-conquer design and selective training strategy could generalize to other multimodal conditional generation problems.
major comments (3)
- [Experiments] Experiments section: The SOTA claims on control accuracy and motion quality rest on the assumption that the first-stage simplified controlled-joint representation supplies sufficient partial observations for artifact-free inpainting, yet no ablation replaces Stage-1 outputs with ground-truth partial observations or compares against an end-to-end joint model. This directly tests the core divide-and-conquer premise but is absent.
- [Method] Method description of SIM: The mechanism is introduced to mitigate overfitting from limited inpainting data, but the paper provides no quantitative comparison of training dynamics or final metrics with and without SIM, leaving the contribution to stability unverified.
- [Experiments] Results tables (HumanML3D and KIT): No error bars, standard deviations across runs, or statistical significance tests are reported for the metrics against baselines, weakening the strength of the SOTA conclusion.
minor comments (1)
- [Abstract] The abstract and introduction could more explicitly state the dimensionality or joint subset used in the simplified representation to clarify what information is retained versus discarded.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment point by point below, indicating planned revisions where appropriate to strengthen the validation of our divide-and-conquer approach.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The SOTA claims on control accuracy and motion quality rest on the assumption that the first-stage simplified controlled-joint representation supplies sufficient partial observations for artifact-free inpainting, yet no ablation replaces Stage-1 outputs with ground-truth partial observations or compares against an end-to-end joint model. This directly tests the core divide-and-conquer premise but is absent.
Authors: We agree that directly testing the core premise with ground-truth partial observations from Stage 1 and a comparison to an end-to-end joint model would provide stronger evidence. We will add these ablations in the revised manuscript, reporting control accuracy and motion quality metrics for both settings to quantify the benefit of the decoupled stages. revision: yes
-
Referee: [Method] Method description of SIM: The mechanism is introduced to mitigate overfitting from limited inpainting data, but the paper provides no quantitative comparison of training dynamics or final metrics with and without SIM, leaving the contribution to stability unverified.
Authors: We acknowledge the need for explicit verification of SIM's contribution. In the revision we will include training loss curves and final performance metrics on HumanML3D and KIT comparing the full model against the variant trained without SIM, thereby quantifying its effect on stability and final results. revision: yes
-
Referee: [Experiments] Results tables (HumanML3D and KIT): No error bars, standard deviations across runs, or statistical significance tests are reported for the metrics against baselines, weakening the strength of the SOTA conclusion.
Authors: We will update the results tables to report standard deviations computed over multiple independent runs with different random seeds. Where feasible we will also add statistical significance tests (e.g., paired t-tests) against the strongest baselines to better substantiate the SOTA claims. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper introduces an explicit new two-stage architecture (first-stage trajectory-guided diffusion on simplified controlled-joint representation, second-stage text-conditioned inpainting) plus the Selective Inpainting Mechanism, with performance measured directly against external benchmarks on HumanML3D and KIT. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the divide-and-conquer strategy and empirical results remain independent of the inputs they are evaluated on.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Diffusion models can generate realistic human motions when conditioned on text or partial observations
Reference graph
Works this paper leans on
-
[1]
Crowdmogen: Event-driven collective human motion generation.Int
Yukang Cao, Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, and Ziwei Liu. Crowdmogen: Event-driven collective human motion generation.Int. J. Comput. Vis., 134(1):29, 2026
work page 2026
-
[2]
Executing your commands via motion diffusion in latent space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InIEEE Conf. Comput. Vis. Pattern Recog., 2023
work page 2023
-
[3]
Hop: Heterogeneous topology-based multimodal entanglement for co-speech gesture generation
Hongye Cheng, Tianyu Wang, Guangsi Shi, Zexing Zhao, and Yanwei Fu. Hop: Heterogeneous topology-based multimodal entanglement for co-speech gesture generation. InIEEE Conf. Comput. Vis. Pattern Recog., 2025
work page 2025
-
[4]
Interaction transformer for human reaction generation.IEEE Trans
Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, and Nicu Sebe. Interaction transformer for human reaction generation.IEEE Trans. Multimedia, 25:8842–8854, 2023
work page 2023
-
[5]
Mofusion: A framework for denoising-diffusion- based motion synthesis
Rishabh Dabral, Muhammad Hamza Mughal, Vladislav Golyanik, and Christian Theobalt. Mofusion: A framework for denoising-diffusion- based motion synthesis. InIEEE Conf. Comput. Vis. Pattern Recog., 2023
work page 2023
-
[6]
Motionlcm: Real-time controllable motion generation via latent consistency model
Wenxun Dai, Ling-Hao Chen, Jingbo Wang, Jinpeng Liu, Bo Dai, and Yansong Tang. Motionlcm: Real-time controllable motion generation via latent consistency model. InEur. Conf. Comput. Vis., 2024
work page 2024
-
[7]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InAdv. Neural Inform. Process. Syst., pages 8780– 8794, 2021
work page 2021
-
[8]
Cg-hoi: Contact-guided 3d human- object interaction generation
Christian Diller and Angela Dai. Cg-hoi: Contact-guided 3d human- object interaction generation. InIEEE Conf. Comput. Vis. Pattern Recog., 2024
work page 2024
-
[9]
Imos: Intent-driven full-body motion synthesis for human-object interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, and Philipp Slusallek. Imos: Intent-driven full-body motion synthesis for human-object interactions. InEur. Assoc. Comput. Graph., 2023
work page 2023
-
[10]
Chenyang Gu, Mingyuan Zhang, Haozhe Xie, Zhongang Cai, Lei Yang, and Ziwei Liu. Bridging semantic and kinematic condi- tions with diffusion-based discrete motion tokenizer.arXiv preprint arXiv:2603.19227, 2026
-
[11]
Momask: Generative masked modeling of 3d human motions
Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, and Li Cheng. Momask: Generative masked modeling of 3d human motions. InIEEE Conf. Comput. Vis. Pattern Recog., 2024
work page 2024
-
[12]
Generative human motion stylization in latent space
Chuan Guo, Yuxuan Mu, Xinxin Zuo, Peng Dai, Youliang Yan, Juwei Lu, and Li Cheng. Generative human motion stylization in latent space. InInt. Conf. Learn. Represent., 2024. 14
work page 2024
-
[13]
Generating diverse and natural 3d human motions from text
Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. Generating diverse and natural 3d human motions from text. InIEEE Conf. Comput. Vis. Pattern Recog., 2022
work page 2022
-
[14]
Action2motion: Conditioned generation of 3d human motions
Chuan Guo, Xinxin Zuo, Sen Wang, Shihao Zou, Qingyao Sun, Annan Deng, Minglun Gong, and Li Cheng. Action2motion: Conditioned generation of 3d human motions. InACM Int. Conf. Multimedia, page 2021–2029, 2020
work page 2021
-
[15]
Xin He, Shaoli Huang, Xiaohang Zhan, Chao Wen, and Ying Shan. Semanticboost: Elevating motion generation with augmented textual cues.arXiv preprint arXiv:2310.20323, 2023
-
[16]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdv. Neural Inform. Process. Syst., 2020
work page 2020
-
[17]
Phase-functioned neural networks for character control.ACM Transactions on Graphics, 2017
Daniel Holden, Taku Komura, and Jun Saito. Phase-functioned neural networks for character control.ACM Transactions on Graphics, 2017
work page 2017
-
[18]
Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, and Ziwei Liu. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars.ACM Transactions on Graphics, 2022
work page 2022
-
[19]
Motiongpt: Human motion as a foreign language
Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign language. InAdv. Neural Inform. Process. Syst., 2024
work page 2024
-
[20]
Local action-guided motion diffusion model for text-to-motion generation
Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, and jie Chen. Local action-guided motion diffusion model for text-to-motion generation. InEur. Conf. Comput. Vis., 2024
work page 2024
-
[21]
Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs
Peng Jin, Yang Wu, Yanbo Fan, Zhongqian Sun, Wei Yang, and Li Yuan. Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs. InAdv. Neural Inform. Process. Syst., 2024
work page 2024
-
[22]
Guided motion diffusion for controllable human motion synthesis
Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, and Siyu Tang. Guided motion diffusion for controllable human motion synthesis. InInt. Conf. Comput. Vis., pages 2151–2162, 2023
work page 2023
-
[23]
Flame: Free-form language-based motion synthesis & editing
Jihoon Kim, Jiseob Kim, and Sungjoon Choi. Flame: Free-form language-based motion synthesis & editing. InAAAI, 2023
work page 2023
-
[24]
Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis
Haiyang Liu, Zihao Zhu, Naoya Iwamoto, Yichen Peng, Zhengqing Li, You Zhou, Elif Bozkurt, and Bo Zheng. Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis. InEur. Conf. Comput. Vis., 2022
work page 2022
-
[25]
Towards variable and coordinated holistic co-speech motion generation
Yifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, and Changxing Ding. Towards variable and coordinated holistic co-speech motion generation. InIEEE Conf. Comput. Vis. Pattern Recog., 2024
work page 2024
-
[26]
Scamo: Exploring the scaling law in autoregressive motion generation model
Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, and Ruimao Zhang. Scamo: Exploring the scaling law in autoregressive motion generation model. InIEEE Conf. Comput. Vis. Pattern Recog., 2025
work page 2025
-
[27]
Countering language drift with seeded iterated learning
Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, and Aaron Courville. Countering language drift with seeded iterated learning. In Int. Conf. Mach. Learn., 2020
work page 2020
-
[28]
Amass: Archive of motion capture as surface shapes
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons- Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InInt. Conf. Comput. Vis., 2019
work page 2019
-
[29]
Rethinking diffusion for text-driven human motion generation
Zichong Meng, Yiming Xie, Xiaogang Peng, Zeyu Han, and Huaizu Jiang. Rethinking diffusion for text-driven human motion generation. InIEEE Conf. Comput. Vis. Pattern Recog., 2025
work page 2025
-
[30]
Long- term motion generation for interactive humanoid robots using gan with convolutional network
Yusuke Nishimura, Yutaka Nakamura, and Hiroshi Ishiguro. Long- term motion generation for interactive humanoid robots using gan with convolutional network. InCompanion of the ACM/IEEE Int. Conf. Hum.- Robot Interact., page 375–377, 2020
work page 2020
-
[31]
Hoi-diff: Text-driven syn- thesis of 3d human-object interactions using diffusion mod- els
Xiaogang Peng, Yiming Xie, Zizhao Wu, Varun Jampani, Deqing Sun, and Huaizu Jiang. Hoi-diff: Text-driven synthesis of 3d human-object interactions using diffusion models.arXiv preprint arXiv:2312.06553, 2023
-
[32]
Temos: Generating diverse human motions from textual descriptions
Mathis Petrovich, Michael J Black, and G ¨ul Varol. Temos: Generating diverse human motions from textual descriptions. InEur. Conf. Comput. Vis., 2022
work page 2022
-
[33]
Tmr: Text-to-motion retrieval using contrastive 3d human motion synthesis
Mathis Petrovich, Michael J Black, and G ¨ul Varol. Tmr: Text-to-motion retrieval using contrastive 3d human motion synthesis. InInt. Conf. Comput. Vis., 2023
work page 2023
-
[34]
Maskcontrol: Spatio-temporal control for masked motion synthesis
Ekkasit Pinyoanuntapong, Muhammad Saleem, Korrawe Karun- ratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, and Sergey Tulyakov. Maskcontrol: Spatio-temporal control for masked motion synthesis. InInt. Conf. Comput. Vis., 2025
work page 2025
-
[35]
Mmm: Generative masked motion model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, and Chen Chen. Mmm: Generative masked motion model. InIEEE Conf. Comput. Vis. Pattern Recog., 2023
work page 2023
-
[36]
The KIT motion-language dataset.IEEE Trans
Matthias Plappert, Christian Mandery, and Tamim Asfour. The KIT motion-language dataset.IEEE Trans. Big Data, 4(4):236–252, dec 2016
work page 2016
-
[37]
Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation.IEEE Trans
Xingqun Qi, Chen Liu, Lincheng Li, Jie Hou, Haoran Xin, and Xin Yu. Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation.IEEE Trans. Multimedia, 26:10420–10430, 2024
work page 2024
-
[38]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InIEEE Conf. Comput. Vis. Pattern Recog., 2023
work page 2023
-
[39]
Human motion diffusion as a generative prior
Yoni Shafir, Guy Tevet, Roy Kapon, and Amit Haim Bermano. Human motion diffusion as a generative prior. InInt. Conf. Learn. Represent., 2024
work page 2024
-
[40]
Junyu Shi, Jianqi Zhong, and Wenming Cao. Multi-semantics aggrega- tion network based on the dynamic-attention mechanism for 3d human motion prediction.IEEE Trans. Multimedia, 26:5194–5206, 2024
work page 2024
-
[41]
Mvdream: Multi-view diffusion for 3d generation
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d generation. InInt. Conf. Learn. Represent., 2024
work page 2024
-
[42]
Kankanhalli, Weidong Geng, and Xiangdong Li
Guofei Sun, Yongkang Wong, Zhiyong Cheng, Mohan S. Kankanhalli, Weidong Geng, and Xiangdong Li. Deepdance: Music-to-dance motion choreography with adversarial learning.IEEE Trans. Multimedia, 23:497–509, 2021
work page 2021
-
[43]
Motionclip: Exposing human motion generation to clip space
Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. Motionclip: Exposing human motion generation to clip space. InEur. Conf. Comput. Vis., 2022
work page 2022
-
[44]
Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffusion model. InInt. Conf. Learn. Represent., 2023
work page 2023
-
[45]
Neural discrete representation learning
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. InAdv. Neural Inform. Process. Syst., 2017
work page 2017
-
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdv. Neural Inform. Process. Syst., 2017
work page 2017
-
[47]
Tlcontrol: Trajectory and language control for human motion synthesis
Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, and Lingjie Liu. Tlcontrol: Trajectory and language control for human motion synthesis. InEur. Conf. Comput. Vis., 2024
work page 2024
-
[48]
Tao Wang, Zhihua Wu, Qiaozhi He, Jiaming Chu, Ling Qian, Yu Cheng, Junliang Xing, Jian Zhao, and Lei Jin. Stickmotion: Generating 3d hu- man motions by drawing a stickman.arXiv preprint arXiv:2503.04829, 2025
-
[49]
Humanise: Language-conditioned human motion generation in 3d scenes
Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, and Siyuan Huang. Humanise: Language-conditioned human motion generation in 3d scenes. InAdv. Neural Inform. Process. Syst., 2022
work page 2022
-
[50]
Intercontrol: Zero-shot human interaction generation by controlling every joint
Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, and Bo Dai. Intercontrol: Zero-shot human interaction generation by controlling every joint. InAdv. Neural Inform. Process. Syst., 2024
work page 2024
-
[51]
Omnicontrol: Control any joint at any time for human motion generation
Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, and Huaizu Jiang. Omnicontrol: Control any joint at any time for human motion generation. InInt. Conf. Learn. Represent., 2024
work page 2024
-
[52]
Implicit compositional generative network for length-variable co-speech gesture synthesis.IEEE Trans
Chenghao Xu, Jiexi Yan, Yanhua Yang, and Cheng Deng. Implicit compositional generative network for length-variable co-speech gesture synthesis.IEEE Trans. Multimedia, 26:6325–6335, 2024
work page 2024
-
[53]
Guiding human-object interactions with rich geometry and relations
Mengqing Xue, Yifei Liu, Ling Guo, Shaoli Huang, and Changxing Ding. Guiding human-object interactions with rich geometry and relations. InIEEE Conf. Comput. Vis. Pattern Recog., 2025
work page 2025
-
[54]
Generating human interaction motions in scenes with text control
Hongwei Yi, Justus Thies, Michael J Black, Xue Bin Peng, and Davis Rempe. Generating human interaction motions in scenes with text control. InEur. Conf. Comput. Vis., 2024
work page 2024
-
[55]
Speech gesture generation from the trimodal context of text, audio, and speaker identity
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. Speech gesture generation from the trimodal context of text, audio, and speaker identity. InACM SIGGRAPH Conf. Comput. Graph. Interact. Tech. Asia, 2020
work page 2020
-
[56]
Divdiff: A conditional diffusion model for diverse human motion pre- diction.IEEE Trans
Hua Yu, Yaqing Hou, Wenbin Pei, Yew-Soon Ong, and Qiang Zhang. Divdiff: A conditional diffusion model for diverse human motion pre- diction.IEEE Trans. Multimedia, pages 1–12, 2024
work page 2024
-
[57]
T2m-gpt: Generating human motion from textual descriptions with discrete representations
Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, and Xi Shen. T2m-gpt: Generating human motion from textual descriptions with discrete representations. InIEEE Conf. Comput. Vis. Pattern Recog., 2023
work page 2023
-
[58]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InInt. Conf. Comput. Vis., pages 3836–3847, 2023
work page 2023
-
[59]
Motiondiffuse: Text-driven human motion generation with diffusion model.IEEE Trans
Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, and Ziwei Liu. Motiondiffuse: Text-driven human motion generation with diffusion model.IEEE Trans. Pattern Anal. Mach. Intell., 46(6):4115–4128, 2024
work page 2024
-
[60]
Smoodi: Stylized motion diffusion model
Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, and Huaizu Jiang. Smoodi: Stylized motion diffusion model. InEur. Conf. Comput. Vis., 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.