Training-free Controllable Human Motion Generation under Heterogeneous Constraints

Bo Yan; Haoxuan Qu; Hossein Rahmani; Jun Liu; Xiaofei Hui

arxiv: 2607.01990 · v1 · pith:NZMHRKG6new · submitted 2026-07-02 · 💻 cs.CV

Training-free Controllable Human Motion Generation under Heterogeneous Constraints

Xiaofei Hui , Bo Yan , Haoxuan Qu , Hossein Rahmani , Jun Liu This is my paper

Pith reviewed 2026-07-03 15:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords training-free motion generationcontrollable human motiondiffusion modelsstochastic controlcriterion-based constraintsheterogeneous constraintsmotion synthesis

0 comments

The pith

Casting diffusion motion generation as stochastic control unifies handling of both differentiable objective constraints and non-differentiable criterion constraints without any training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that diffusion-based human motion generation can be reframed as a stochastic control problem to derive step-wise control laws. These laws support constraints that are either continuous and objective-based with differentiable losses or criterion-based with only discontinuous, sparse, or black-box feedback. The same mechanism also supplies an adaptive coordination step that balances multiple constraints during the denoising process. A sympathetic reader would care because most practical motion requirements fall into the criterion-based category that prior training-free methods cannot accommodate.

Core claim

MIC is the first training-free motion generation framework that handles both continuous objective-based and criterion-based motion constraints under a shared mechanism by casting diffusion-based motion generation as a stochastic control problem. This perspective provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability and naturally accommodate objective-based constraints as a special case, while also motivating a control-oriented constraint coordination mechanism that adaptively balances and reconciles motion constraints during generation.

What carries the argument

The stochastic control formulation of the diffusion denoising process, which yields explicit step-wise control laws for enforcing constraints at each generation step.

If this is right

Criterion-based constraints with only discontinuous or black-box feedback become enforceable during generation.
Objective-based constraints with differentiable losses are handled as a direct special case.
Multiple heterogeneous constraints are balanced adaptively by a control-oriented coordination mechanism at each step.
Effectiveness holds across diverse constraint settings in reported experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The control perspective may transfer to other diffusion generation domains where constraints mix smooth and discrete forms.
Real-time user interaction could supply criterion feedback directly to the step-wise laws without retraining.
Longer or multi-person sequences might reveal whether the coordination mechanism scales without additional tuning.

Load-bearing premise

The stochastic control view of diffusion motion generation produces effective step-wise laws that can enforce criterion-based constraints without differentiability.

What would settle it

Generate motions under a non-differentiable criterion constraint such as requiring a foot to land exactly on a discrete target point at a specific frame; measure whether the success rate exceeds that of random sampling or existing training-free baselines on the same constraint.

Figures

Figures reproduced from arXiv: 2607.01990 by Bo Yan, Haoxuan Qu, Hossein Rahmani, Jun Liu, Xiaofei Hui.

**Figure 1.** Figure 1: Real-world motion constraints can be heterogeneous. Some are continuous objective-based constraints that can be naturally modeled with differentiable losses (e.g., reaching a target point), whereas many others are criterion-based and provide feedback only as sparse, event-triggered, or binary evaluations, such as simulation-based validity tests and safety-threshold rules (e.g., entering the danger zone at … view at source ↗

**Figure 2.** Figure 2: Illustration of our proposed MIC framework. MIC adopts a unified constraint interface that converts heterogeneous constraints into control signals: criterion-based constraints are handled via forward terminal costs, while continuous objective-based constraints employ gradient computation. A constraint coordination mechanism then integrates these signals, with a feedback regulator adapting constraint weight… view at source ↗

**Figure 3.** Figure 3: Visualization comparisons to previous methods. The compared methods produce motions with constraint violations such as boundary penetration and collisions (highlighted in red boxes), whereas our method consistently satisfies constraints while maintaining natural motion quality. Qualitative Results. We also show qualitative comparisons. As shown in [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons with ProgMoGen [41] and ReAlign [67] in physical simulation. The motions generated by the compared methods exhibit clear constraint violations and unstable movements (highlighted in red boxes), while MIC consistently satisfies different constraints while maintaining motion naturalness. 4.2 Ablation Study Below we conduct ablation studies on HSI-2 task. More ablation studies, further… view at source ↗

**Figure 5.** Figure 5: User study. We report the preference rate (%) of pairwise comparisons between our MIC and each compared method. User study. Following [9, 61], we conduct a user study to assess motion quality. We compare MIC with DNO [31], ProgMoGen [41], and ReAlign [67], using 30 motions generated from the same prompts and constraint settings. Twenty participants complete paired comparisons between MIC and each bas… view at source ↗

read the original abstract

Training-free controllable motion generation has attracted growing interest for enabling flexible constraint enforcement without constraint-specific training. However, existing training-free methods require constraints to be continuous objective-based with differentiable losses, while many real-world requirements are criterion-based and provide only discontinuous, sparse, or even black-box feedback. In this paper, we propose Motion-Inference-as-Control (MIC), the first training-free motion generation framework that handles both continuous objective-based and criterion-based motion constraints under a shared mechanism. The key idea is to cast diffusion-based motion generation as a stochastic control problem. This perspective not only provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability and naturally accommodate objective-based constraints as a special case, but also motivates a control-oriented constraint coordination mechanism that adaptively balances and reconciles motion constraints during generation. Experiments across diverse constraint settings demonstrate the effectiveness of our framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MIC recasts diffusion motion generation as stochastic control to handle both differentiable objective constraints and non-differentiable criterion-based ones without training, but the abstract supplies no derivation of the step-wise laws or coordination mechanism.

read the letter

The paper's central move is to treat the diffusion process itself as a stochastic control problem. This lets them write step-wise control laws that apply to criterion-based constraints (discontinuous, sparse, or black-box) without needing gradients, while treating the usual objective-based losses as a special case. They also introduce a coordination step that adaptively balances multiple constraints during sampling.

What is new is the explicit unification of the two constraint types inside one training-free diffusion pipeline. Earlier work handled one or the other, but not both under the same mechanism. The practical motivation is sound: real animation and robotics tasks often involve rules that are not easily turned into differentiable losses.

The soft spot is exactly where the stress-test note points. The abstract asserts that the stochastic control view produces principled and effective laws for non-differentiable feedback, yet it gives no derivation, no approximation scheme, and no indication of how the control law is computed when the constraint returns only a binary or sparse signal. Standard classifier-free guidance or gradient-based methods do not extend automatically to that setting, so the claim rests on an unshown step. Experiments are mentioned across diverse settings, but without seeing the actual control equations, baselines, or ablation on the coordination mechanism, it is impossible to tell whether the framing delivers or merely renames the problem.

The paper is aimed at researchers working on controllable human motion synthesis who need to enforce heterogeneous real-world rules. A reader looking for new ways to think about diffusion sampling might find the control perspective useful, but only if the full manuscript contains the missing derivations and reproducible results. It deserves a serious referee because the problem it targets is genuine and the proposed direction is coherent on its own terms, even though the current evidence is thin.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes Motion-Inference-as-Control (MIC), the first training-free framework for controllable human motion generation under heterogeneous constraints. It casts diffusion-based motion generation as a stochastic control problem to obtain step-wise control laws that handle both continuous objective-based constraints (treated as a special case) and criterion-based constraints (discontinuous, sparse, or black-box) without requiring differentiability. The approach also introduces a control-oriented mechanism to adaptively balance and reconcile multiple constraints during generation. Experiments across diverse constraint settings are reported to demonstrate effectiveness.

Significance. If the stochastic control reformulation yields effective step-wise laws for non-differentiable constraints, the work would meaningfully extend training-free motion generation to a wider range of real-world requirements that existing gradient-based methods cannot accommodate. The unified mechanism and adaptive coordination could reduce the need for constraint-specific engineering.

major comments (1)

[Abstract] Abstract, paragraph 3: the central claim that the stochastic control perspective 'provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability' is asserted without any derivation, approximation scheme, or explicit control-law equation. Standard diffusion guidance relies on constraint gradients; extending to non-differentiable feedback requires additional mechanisms (e.g., zeroth-order estimates) whose correctness is not secured by the framing alone. This is load-bearing for the 'first training-free framework handling both types under a shared mechanism' claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph 3: the central claim that the stochastic control perspective 'provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability' is asserted without any derivation, approximation scheme, or explicit control-law equation. Standard diffusion guidance relies on constraint gradients; extending to non-differentiable feedback requires additional mechanisms (e.g., zeroth-order estimates) whose correctness is not secured by the framing alone. This is load-bearing for the 'first training-free framework handling both types under a shared mechanism' claim.

Authors: We thank the referee for this observation. The abstract is a concise summary; the full derivation appears in Section 3, where the reverse diffusion process is recast as a finite-horizon stochastic control problem. The resulting step-wise control law is obtained by solving the associated stochastic Hamilton-Jacobi-Bellman equation (Eq. 7), which yields an explicit feedback form that depends only on the current state and the constraint value. For criterion-based (non-differentiable) constraints the same law is applied by evaluating the constraint directly inside the cost functional, without any gradient; the derivation shows that the optimality condition remains valid under the weaker assumption that the constraint is measurable. Objective-based constraints emerge as the special case in which the cost is differentiable. The shared mechanism therefore follows directly from the control formulation rather than from an ad-hoc extension of gradient guidance. If the editor wishes, we will add a parenthetical reference to Eq. 7 in the abstract. revision: partial

Circularity Check

0 steps flagged

No circularity detected; stochastic control framing is presented as independent perspective

full rationale

The abstract and description introduce MIC by casting diffusion motion generation as a stochastic control problem to obtain step-wise laws for heterogeneous constraints. No equations, fitted parameters, or self-citations are exhibited that reduce the claimed control laws or coordination mechanism to prior inputs by construction. The derivation chain is self-contained as a novel perspective rather than a renaming or statistical forcing of existing quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted from the manuscript text.

pith-pipeline@v0.9.1-grok · 5686 in / 993 out tokens · 17576 ms · 2026-07-03T15:30:09.125228+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

85 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

Åström, K.J., Murray, R.: Feedback systems: an introduction for scientists and en- gineers. Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

2021
[2]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Bae, J., Hwang, I., Lee, Y.Y., Guo, Z., Liu, J., Ben-Shabat, Y., Kim, Y.M., Kapa- dia, M.: Less is more: Improving motion diffusion models with sparse keyframes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11069–11078 (2025)

2025
[3]

Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

Blaha, T.M., Smeur, E.J.J., Remes, B.D.W.: A survey of optimal control allocation for aerial vehicle control. Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

2023
[4]

In: Handbook of statistics, vol

Botev,Z.I.,Kroese,D.P.,Rubinstein,R.Y.,L’ecuyer,P.:Thecross-entropymethod for optimization. In: Handbook of statistics, vol. 31, pp. 35–59. Elsevier (2013)

2013
[5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cai, Y., Wang, Y., Zhu, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., et al.: A unified 3d human motion synthesis model via condi- tional variational auto-encoder. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11645–11655 (2021)

2021
[6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cao, B., Zheng, S., Wang, Y., Xia, L., Wei, Q., Jin, Q., Liu, J., Lu, Z.: Motionc- trl: A real-time controllable vision-language-motion model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12253–12262 (2025)

2025
[7]

IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10

Caserman, P., Garcia-Agundez, A., Göbel, S.: A survey of full-body motion recon- struction in immersive virtual reality applications. IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10. 1109/TVCG.2019.2912607

work page arXiv 2020
[8]

arXiv preprint arXiv:2410.18977 (2024)

Chen, L.H., Lu, S., Dai, W., Dou, Z., Ju, X., Wang, J., Komura, T., Zhang, L.: Pay attention and move better: Harnessing attention for interactive motion generation and training-free editing. arXiv preprint arXiv:2410.18977 (2024)

work page arXiv 2024
[9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Chen, X., Jiang, B., Liu, W., Huang, Z., Fu, B., Chen, T., Yu, G.: Executing your commands via motion diffusion in latent space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18000–18010 (2023)

2023
[10]

In: Proceedings of Robotics: Science and Systems

Cheng, X., Ji, Y., Chen, J., Yang, R., Yang, G., Wang, X.: Expressive Whole-Body Control for Humanoid Robots. In: Proceedings of Robotics: Science and Systems. Delft, Netherlands (July 2024).https://doi.org/10.15607/RSS.2024.XX.107

work page doi:10.15607/rss.2024.xx.107 2024
[11]

In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Christopher, J.K., Baek, S., Fioretto, F.: Constrained synthesis with projected dif- fusion models. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Sys- tems. vol. 37, pp. 89307–89333. Curran Associates, Inc. (2024).https://doi. org/10.52202/079017-2834,https://proce...

work page doi:10.52202/079017-2834 2024
[12]

In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

2023
[13]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Dabral, R., Mughal, M.H., Golyanik, V., Theobalt, C.: Mofusion: A framework for denoising-diffusion-based motion synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9760–9770 (2023)

2023
[14]

In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Degardin, B., Neves, J., Lopes, V., Brito, J., Yaghoubi, E., Proença, H.: Generative adversarial graph convolutional networks for human action synthesis. In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1150–1159 (2022) Controllable Motion Generation under Heterogeneous Constraints 17

2022
[15]

In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

Domingo-Enrich, C., Drozdzal, M., Karrer, B., Chen, R.T.Q.: Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic op- timal control. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

2025
[16]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Dou, H., Chen, Z., Li, Z., Li, H., Yang, L., Deng, Y.: Constrained particle seeking: Solving diffusion inverse problems with just forward passes. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 20870–20878 (2026)

2026
[17]

Journal of the American Statistical Association106(496), 1602–1614 (2011)

Efron, B.: Tweedie’s formula and selection bias. Journal of the American Statistical Association106(496), 1602–1614 (2011)

2011
[18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Foo, L.G., Gong, J., Rahmani, H., Liu, J.: Distribution-aligned diffusion for hu- man mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9221–9232 (2023)

2023
[19]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Girolamo, M., Lorenzo, M., Bicchierai, M., Berretti, S., Bagdanov, A.D.: No mocap needed: Post-training motion diffusion models with reinforcement learning using only textual prompts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 967–976 (2026)

2026
[20]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: Diffpose: Toward more reliable 3d pose estimation. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 13041–13051 (2023)

2023
[21]

Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

Grenander, U., Miller, M.I.: Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

1994
[22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Guo, C., Mu, Y., Javed, M.G., Wang, S., Cheng, L.: Momask: Generative masked modeling of 3d human motions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1900–1910 (2024)

1900
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., Cheng, L.: Generating di- verse and natural 3d human motions from text. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5152–5161 (June 2022)

2022
[24]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Guo, Z., Hu, Z., Soh, D.W., Zhao, N.: Motionlab: Unified human motion gener- ation and editing via the motion-condition-motion paradigm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13869–13879 (2025)

2025
[25]

In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Han, G., Liang, M., Tang, J., Cheng, Y., Liu, W., Huang, S.: Reindiffuse: Crafting physically plausible motions with reinforced diffusion model. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2218–2227. IEEE (2025)

2025
[26]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[27]

In: International Conference on Machine Learning

Huang,Y.,Ghatare,A.,Liu,Y.,Hu,Z.,Zhang,Q.,Sastry,C.S.,Gururani,S.,Oore, S.,Yue,Y.:Symbolicmusicgenerationwithnon-differentiableruleguideddiffusion. In: International Conference on Machine Learning. pp. 19772–19797. PMLR (2024)

2024
[28]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Hwang, I., Bae, J., Lim, D., Kim, Y.M.: Motion synthesis with sparse and flexible keyjoint control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 13203–13213 (October 2025)

2025
[29]

In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X

Jeong, M., Hwang, Y., Lee, J., Jung, S., Kim, W.H.: HGM³: Hierarchical generative masked motion modeling with hard token mining. In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X. Hui et al

2025
[30]

Advances in Neural Information Processing Systems36, 20067–20079 (2023)

Jiang, B., Chen, X., Liu, W., Yu, J., Yu, G., Chen, T.: Motiongpt: Human motion as a foreign language. Advances in Neural Information Processing Systems36, 20067–20079 (2023)

2023
[31]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 1334–1345 (2024)

2024
[32]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided mo- tion diffusion for controllable human motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 2151–2162 (October 2023)

2023
[33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Kim, J., Kim, B.S., Ye, J.C.: Free2guide: Training-free text-to-video alignment using image lvlm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 17920–17929 (October 2025)

2025
[34]

Journal of Global Optimization 37(1), 137–157 (2007)

Kroese, D.P., Rubinstein, R.Y., Taimre, T.: Application of the cross-entropy method to clustering and vector quantization. Journal of Global Optimization 37(1), 137–157 (2007)

2007
[35]

ACM Trans

Lee, K., Lee, S., Lee, J.: Interactivecharacter animation by learning multi-objective control. ACM Trans. Graph.37(6) (Dec 2018).https://doi.org/10.1145/ 3272127.3275071,https://doi.org/10.1145/3272127.3275071

work page doi:10.1145/3272127.3275071 2018
[36]

In: European Conference on Computer Vision

Li, J., Clegg, A., Mottaghi, R., Wu, J., Puig, X., Liu, C.K.: Controllable human- object interaction synthesis. In: European Conference on Computer Vision. pp. 54–72. Springer (2024)

2024
[37]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

Li, Z., Cheng, K., Ghosh, A., Bhattacharya, U., Gui, L., Bera, A.: Simmotionedit: Text-based human motion editing with motion similarity prediction. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 27827–27837 (2025)

2025
[38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, Z., Luo, M., Hou, R., Zhao, X., Liu, H., Chang, H., Liu, Z., Li, C.: Morph: A motion-free physics optimization framework for human motion generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14580–14589 (2025)

2025
[39]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, Z., Rahmani, H., Ke, Q., Liu, J.: Longdiff: Training-free long video generation in one go. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17789–17798 (June 2025)

2025
[40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Rahmani, H., Zhang, J., Xue, Y., Mirmehdi, M., Kuen, J., Gu, J., Liu, J.: Diffgraph: An automated agent-driven model merging framework for in-the- wild text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 36713–36723 (2026)

2026
[41]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, H., Zhan, X., Huang, S., Mu, T.J., Shan, Y.: Programmable motion generation for open-set motion control tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1399–1408 (2024)

2024
[42]

In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024

Louis, N., Khoshlessan, M., Corso, J.J.: Measuring physical plausibility of 3d human poses using physics simulation. In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMVA (2024), https://bmva-archive.org.uk/bmvc/2024/papers/Paper_615/paper.pdf

2024
[43]

Advances in Neural Information Processing Systems35, 6815–6828 (2022)

Luo, Z., Iwase, S., Yuan, Y., Kitani, K.: Embodied scene-aware human pose esti- mation. Advances in Neural Information Processing Systems35, 6815–6828 (2022)

2022
[44]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Meng, Z., Xie, Y., Peng, X., Han, Z., Jiang, H.: Rethinking diffusion for text-driven human motion generation: Redundant representations, evaluation, and masked au- toregression. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 27859–27871 (2025) Controllable Motion Generation under Heterogeneous Constraints 19

2025
[45]

Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

Naidu, D.S.: Optimal control systems. Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

2018
[46]

Universitext, Springer Science & Business Media (2013),https://books.google

Oksendal, B.: Stochastic differential equations: an introduction with applications. Universitext, Springer Science & Business Media (2013),https://books.google. co.uk/books?id=gizqCAAAQBAJ

2013
[47]

In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision

Ota, S., Yu, Q., Fujiwara, K., Ikehata, S., Sato, I.: Pino: Person-interaction noise optimization for long-duration and customizable motion generation of arbitrary- sized groups. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision. pp. 10676–10685 (2025)

2025
[48]

In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

Pandey, K., Sofian, F.M., Draxler, F., Karaletsos, T., Mandt, S.: Variational con- trol for guidance in diffusion models. In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

2025
[49]

Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

Pavon, M.: Stochastic control and nonequilibrium thermodynamical systems. Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

1989
[50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pinyoanuntapong, E., Saleem, M., Karunratanakul, K., Wang, P., Xue, H., Chen, C., Guo, C., Cao, J., Ren, J., Tulyakov, S.: Maskcontrol: Spatio-temporal con- trol for masked motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9955–9965 (2025)

2025
[51]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[52]

In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

Shafir, Y., Tevet, G., Kapon, R., Bermano, A.H.: Human motion diffusion as a generative prior. In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

2024
[53]

Shen, Y., Jiang, X., Yang, Y., Wang, Y., Han, D., Li, D.: Understanding and im- provingtraining-freeloss-baseddiffusionguidance.AdvancesinNeuralInformation Processing Systems37, 108974–109002 (2024)

2024
[54]

In: International Conference on Machine Learning

Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.Y., Kautz, J., Chen, Y., Vah- dat, A.: Loss-guided diffusion models for plug-and-play controllable generation. In: International Conference on Machine Learning. pp. 32483–32498. PMLR (2023)

2023
[55]

In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

2021
[56]

IEEE Transactions on Automatic Control37(3), 332–341 (1992).https://doi.org/10.1109/9.119632

Spall, J.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control37(3), 332–341 (1992).https://doi.org/10.1109/9.119632

work page doi:10.1109/9.119632 1992
[57]

Estimation of the mean of a multivariate normal distribution,

Stein, C.M.: Estimation of the mean of a multivariate normal distribution. The An- nals of Statistics9(6), 1135–1151 (1981),http://www.jstor.org/stable/2240405

work page arXiv 1981
[58]

In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N

Tan, X., Wang, H., Geng, X., Zhou, P.: Sopo: Text-to-motion generation us- ing semi-online preference optimization. In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N. (eds.) Advances in Neural Information Processing Systems. vol. 38, pp. 60714–60747. Curran Associates, Inc. (2025),https://proceedings.neurips.cc/paper_fi...

2025
[59]

arXiv preprint arXiv:2501.16778 (2025)

Tashakori, A., Tashakori, A., Yang, G., Wang, Z.J., Servati, P.: Flexmotion: Lightweight, physics-aware, and controllable human motion generation. arXiv preprint arXiv:2501.16778 (2025)

work page arXiv 2025
[60]

Hui et al

Tevet, G., Raab, S., Cohan, S., Reda, D., Luo, Z., Peng, X.B., Bermano, A.H., van de Panne, M.: CLoSD: Closing the loop between simulation and diffu- 20 X. Hui et al. sion for multi-task character control. In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv

2025
[61]

In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

2023
[62]

The Journal of Machine Learning Research11, 3137– 3181 (2010)

Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research11, 3137– 3181 (2010)

2010
[63]

In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems

Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033 (2012).https://doi.org/10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[64]

In: Proceedings of the AAAI conference on artificial intelligence

Wang, Z., Yu, P., Zhao, Y., Zhang, R., Zhou, Y., Yuan, J., Chen, C.: Learning diverse stochastic human-action generators by learning smooth latent transitions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 12281– 12288 (2020)

2020
[65]

arXiv preprint arXiv:2509.20927 (2025)

Watanabe, A., Ren, J., Siyao, L., Peng, Y., Wu, E., Simo-Serra, E.: Simdiff: Simulator-constrained diffusion model for physically plausible motion generation. arXiv preprint arXiv:2509.20927 (2025)

work page arXiv 2025
[66]

arXiv preprint arXiv:2512.23464 (2025)

Wen, Y., Shuai, Q., Kang, D., Li, J., Wen, C., Qian, Y., Jiao, N., Chen, C., Chen, W., Wang, Y., et al.: Hy-motion 1.0: Scaling flow matching models for text-to- motion generation. arXiv preprint arXiv:2512.23464 (2025)

work page arXiv 2025
[67]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Weng, W., Tan, X., Wang, J., Xie, G.S., Zhou, P., Wang, H.: Realign: text-to- motion generation via step-aware reward-guided alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 10621–10629 (2026)

2026
[68]

Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

Williams, D.: Probability with Martingales. Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

1991
[69]

Machine learning8(3), 229–256 (1992)

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning8(3), 229–256 (1992)

1992
[70]

In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: Omnicontrol: Control any joint at any time for human motion generation. In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

2024
[71]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Xu, L., Qu, H., Cai, Y., Liu, J.: 6d-diff: A keypoint diffusion framework for 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9676–9686 (2024)

2024
[72]

ACM Comput

Yang,L.,Zhang,Z.,Song,Y.,Hong,S.,Xu,R.,Zhao,Y.,Zhang,W.,Cui,B.,Yang, M.H.: Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv.56(4) (Nov 2023).https://doi.org/10.1145/3626235, https://doi.org/10.1145/3626235

work page doi:10.1145/3626235 2023
[73]

IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

Yin, T., Hoyet, L., Christie, M., Cani, M.P., Pettré, J.: The one-man-crowd: Single user generation of crowd motions using virtual reality. IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

work page doi:10.1109/tvcg.2022.3150507 2022
[74]

Yong,J.,Zhou,X.Y.:Stochasticcontrols:HamiltoniansystemsandHJBequations, vol. 43. Springer Science & Business Media (1999)

1999
[75]

arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

Yu, H., Liu, J., Gui, X., Wong, M., Hou, Y., Ong, Y.S.: A plug-and-play multi-criteria guidance for diverse in-betweening human motion generation. arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

work page arXiv 2025
[76]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided hu- man motion diffusion model. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 16010–16021 (2023)

2023
[77]

In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition

Zhang, J., Zhang, Y., Cun, X., Zhang, Y., Zhao, H., Lu, H., Shen, X., Shan, Y.: Generating human motion from textual descriptions with discrete representa- tions. In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition. pp. 14730–14740 (2023)

2023
[78]

IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

2024
[79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, R., Li, C., Lou, Y., Shi, Y., Wang, H., Huang, Y.: Steering where to dif- fuse: Generative modeling of phenotypic response simulation with steered diffusion bridge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 27367–27377 (June 2026)

2026
[80]

In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

Zhang, S., Zhang, W., Gu, Q.: Energy-weighted flow matching for offline reinforce- ment learning. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

2025

Showing first 80 references.

[1] [1]

Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

Åström, K.J., Murray, R.: Feedback systems: an introduction for scientists and en- gineers. Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

2021

[2] [2]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Bae, J., Hwang, I., Lee, Y.Y., Guo, Z., Liu, J., Ben-Shabat, Y., Kim, Y.M., Kapa- dia, M.: Less is more: Improving motion diffusion models with sparse keyframes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11069–11078 (2025)

2025

[3] [3]

Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

Blaha, T.M., Smeur, E.J.J., Remes, B.D.W.: A survey of optimal control allocation for aerial vehicle control. Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

2023

[4] [4]

In: Handbook of statistics, vol

Botev,Z.I.,Kroese,D.P.,Rubinstein,R.Y.,L’ecuyer,P.:Thecross-entropymethod for optimization. In: Handbook of statistics, vol. 31, pp. 35–59. Elsevier (2013)

2013

[5] [5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cai, Y., Wang, Y., Zhu, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., et al.: A unified 3d human motion synthesis model via condi- tional variational auto-encoder. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11645–11655 (2021)

2021

[6] [6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cao, B., Zheng, S., Wang, Y., Xia, L., Wei, Q., Jin, Q., Liu, J., Lu, Z.: Motionc- trl: A real-time controllable vision-language-motion model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12253–12262 (2025)

2025

[7] [7]

IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10

Caserman, P., Garcia-Agundez, A., Göbel, S.: A survey of full-body motion recon- struction in immersive virtual reality applications. IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10. 1109/TVCG.2019.2912607

work page arXiv 2020

[8] [8]

arXiv preprint arXiv:2410.18977 (2024)

Chen, L.H., Lu, S., Dai, W., Dou, Z., Ju, X., Wang, J., Komura, T., Zhang, L.: Pay attention and move better: Harnessing attention for interactive motion generation and training-free editing. arXiv preprint arXiv:2410.18977 (2024)

work page arXiv 2024

[9] [9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Chen, X., Jiang, B., Liu, W., Huang, Z., Fu, B., Chen, T., Yu, G.: Executing your commands via motion diffusion in latent space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18000–18010 (2023)

2023

[10] [10]

In: Proceedings of Robotics: Science and Systems

Cheng, X., Ji, Y., Chen, J., Yang, R., Yang, G., Wang, X.: Expressive Whole-Body Control for Humanoid Robots. In: Proceedings of Robotics: Science and Systems. Delft, Netherlands (July 2024).https://doi.org/10.15607/RSS.2024.XX.107

work page doi:10.15607/rss.2024.xx.107 2024

[11] [11]

In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Christopher, J.K., Baek, S., Fioretto, F.: Constrained synthesis with projected dif- fusion models. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Sys- tems. vol. 37, pp. 89307–89333. Curran Associates, Inc. (2024).https://doi. org/10.52202/079017-2834,https://proce...

work page doi:10.52202/079017-2834 2024

[12] [12]

In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

2023

[13] [13]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Dabral, R., Mughal, M.H., Golyanik, V., Theobalt, C.: Mofusion: A framework for denoising-diffusion-based motion synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9760–9770 (2023)

2023

[14] [14]

In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Degardin, B., Neves, J., Lopes, V., Brito, J., Yaghoubi, E., Proença, H.: Generative adversarial graph convolutional networks for human action synthesis. In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1150–1159 (2022) Controllable Motion Generation under Heterogeneous Constraints 17

2022

[15] [15]

In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

Domingo-Enrich, C., Drozdzal, M., Karrer, B., Chen, R.T.Q.: Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic op- timal control. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

2025

[16] [16]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Dou, H., Chen, Z., Li, Z., Li, H., Yang, L., Deng, Y.: Constrained particle seeking: Solving diffusion inverse problems with just forward passes. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 20870–20878 (2026)

2026

[17] [17]

Journal of the American Statistical Association106(496), 1602–1614 (2011)

Efron, B.: Tweedie’s formula and selection bias. Journal of the American Statistical Association106(496), 1602–1614 (2011)

2011

[18] [18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Foo, L.G., Gong, J., Rahmani, H., Liu, J.: Distribution-aligned diffusion for hu- man mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9221–9232 (2023)

2023

[19] [19]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Girolamo, M., Lorenzo, M., Bicchierai, M., Berretti, S., Bagdanov, A.D.: No mocap needed: Post-training motion diffusion models with reinforcement learning using only textual prompts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 967–976 (2026)

2026

[20] [20]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: Diffpose: Toward more reliable 3d pose estimation. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 13041–13051 (2023)

2023

[21] [21]

Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

Grenander, U., Miller, M.I.: Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

1994

[22] [22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Guo, C., Mu, Y., Javed, M.G., Wang, S., Cheng, L.: Momask: Generative masked modeling of 3d human motions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1900–1910 (2024)

1900

[23] [23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., Cheng, L.: Generating di- verse and natural 3d human motions from text. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5152–5161 (June 2022)

2022

[24] [24]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Guo, Z., Hu, Z., Soh, D.W., Zhao, N.: Motionlab: Unified human motion gener- ation and editing via the motion-condition-motion paradigm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13869–13879 (2025)

2025

[25] [25]

In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Han, G., Liang, M., Tang, J., Cheng, Y., Liu, W., Huang, S.: Reindiffuse: Crafting physically plausible motions with reinforced diffusion model. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2218–2227. IEEE (2025)

2025

[26] [26]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020

[27] [27]

In: International Conference on Machine Learning

Huang,Y.,Ghatare,A.,Liu,Y.,Hu,Z.,Zhang,Q.,Sastry,C.S.,Gururani,S.,Oore, S.,Yue,Y.:Symbolicmusicgenerationwithnon-differentiableruleguideddiffusion. In: International Conference on Machine Learning. pp. 19772–19797. PMLR (2024)

2024

[28] [28]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Hwang, I., Bae, J., Lim, D., Kim, Y.M.: Motion synthesis with sparse and flexible keyjoint control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 13203–13213 (October 2025)

2025

[29] [29]

In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X

Jeong, M., Hwang, Y., Lee, J., Jung, S., Kim, W.H.: HGM³: Hierarchical generative masked motion modeling with hard token mining. In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X. Hui et al

2025

[30] [30]

Advances in Neural Information Processing Systems36, 20067–20079 (2023)

Jiang, B., Chen, X., Liu, W., Yu, J., Yu, G., Chen, T.: Motiongpt: Human motion as a foreign language. Advances in Neural Information Processing Systems36, 20067–20079 (2023)

2023

[31] [31]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 1334–1345 (2024)

2024

[32] [32]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided mo- tion diffusion for controllable human motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 2151–2162 (October 2023)

2023

[33] [33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Kim, J., Kim, B.S., Ye, J.C.: Free2guide: Training-free text-to-video alignment using image lvlm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 17920–17929 (October 2025)

2025

[34] [34]

Journal of Global Optimization 37(1), 137–157 (2007)

Kroese, D.P., Rubinstein, R.Y., Taimre, T.: Application of the cross-entropy method to clustering and vector quantization. Journal of Global Optimization 37(1), 137–157 (2007)

2007

[35] [35]

ACM Trans

Lee, K., Lee, S., Lee, J.: Interactivecharacter animation by learning multi-objective control. ACM Trans. Graph.37(6) (Dec 2018).https://doi.org/10.1145/ 3272127.3275071,https://doi.org/10.1145/3272127.3275071

work page doi:10.1145/3272127.3275071 2018

[36] [36]

In: European Conference on Computer Vision

Li, J., Clegg, A., Mottaghi, R., Wu, J., Puig, X., Liu, C.K.: Controllable human- object interaction synthesis. In: European Conference on Computer Vision. pp. 54–72. Springer (2024)

2024

[37] [37]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

Li, Z., Cheng, K., Ghosh, A., Bhattacharya, U., Gui, L., Bera, A.: Simmotionedit: Text-based human motion editing with motion similarity prediction. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 27827–27837 (2025)

2025

[38] [38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, Z., Luo, M., Hou, R., Zhao, X., Liu, H., Chang, H., Liu, Z., Li, C.: Morph: A motion-free physics optimization framework for human motion generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14580–14589 (2025)

2025

[39] [39]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, Z., Rahmani, H., Ke, Q., Liu, J.: Longdiff: Training-free long video generation in one go. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17789–17798 (June 2025)

2025

[40] [40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Rahmani, H., Zhang, J., Xue, Y., Mirmehdi, M., Kuen, J., Gu, J., Liu, J.: Diffgraph: An automated agent-driven model merging framework for in-the- wild text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 36713–36723 (2026)

2026

[41] [41]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, H., Zhan, X., Huang, S., Mu, T.J., Shan, Y.: Programmable motion generation for open-set motion control tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1399–1408 (2024)

2024

[42] [42]

In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024

Louis, N., Khoshlessan, M., Corso, J.J.: Measuring physical plausibility of 3d human poses using physics simulation. In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMVA (2024), https://bmva-archive.org.uk/bmvc/2024/papers/Paper_615/paper.pdf

2024

[43] [43]

Advances in Neural Information Processing Systems35, 6815–6828 (2022)

Luo, Z., Iwase, S., Yuan, Y., Kitani, K.: Embodied scene-aware human pose esti- mation. Advances in Neural Information Processing Systems35, 6815–6828 (2022)

2022

[44] [44]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Meng, Z., Xie, Y., Peng, X., Han, Z., Jiang, H.: Rethinking diffusion for text-driven human motion generation: Redundant representations, evaluation, and masked au- toregression. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 27859–27871 (2025) Controllable Motion Generation under Heterogeneous Constraints 19

2025

[45] [45]

Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

Naidu, D.S.: Optimal control systems. Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

2018

[46] [46]

Universitext, Springer Science & Business Media (2013),https://books.google

Oksendal, B.: Stochastic differential equations: an introduction with applications. Universitext, Springer Science & Business Media (2013),https://books.google. co.uk/books?id=gizqCAAAQBAJ

2013

[47] [47]

In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision

Ota, S., Yu, Q., Fujiwara, K., Ikehata, S., Sato, I.: Pino: Person-interaction noise optimization for long-duration and customizable motion generation of arbitrary- sized groups. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision. pp. 10676–10685 (2025)

2025

[48] [48]

In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

Pandey, K., Sofian, F.M., Draxler, F., Karaletsos, T., Mandt, S.: Variational con- trol for guidance in diffusion models. In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

2025

[49] [49]

Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

Pavon, M.: Stochastic control and nonequilibrium thermodynamical systems. Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

1989

[50] [50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pinyoanuntapong, E., Saleem, M., Karunratanakul, K., Wang, P., Xue, H., Chen, C., Guo, C., Cao, J., Ren, J., Tulyakov, S.: Maskcontrol: Spatio-temporal con- trol for masked motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9955–9965 (2025)

2025

[51] [51]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[52] [52]

In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

Shafir, Y., Tevet, G., Kapon, R., Bermano, A.H.: Human motion diffusion as a generative prior. In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

2024

[53] [53]

Shen, Y., Jiang, X., Yang, Y., Wang, Y., Han, D., Li, D.: Understanding and im- provingtraining-freeloss-baseddiffusionguidance.AdvancesinNeuralInformation Processing Systems37, 108974–109002 (2024)

2024

[54] [54]

In: International Conference on Machine Learning

Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.Y., Kautz, J., Chen, Y., Vah- dat, A.: Loss-guided diffusion models for plug-and-play controllable generation. In: International Conference on Machine Learning. pp. 32483–32498. PMLR (2023)

2023

[55] [55]

In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

2021

[56] [56]

IEEE Transactions on Automatic Control37(3), 332–341 (1992).https://doi.org/10.1109/9.119632

Spall, J.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control37(3), 332–341 (1992).https://doi.org/10.1109/9.119632

work page doi:10.1109/9.119632 1992

[57] [57]

Estimation of the mean of a multivariate normal distribution,

Stein, C.M.: Estimation of the mean of a multivariate normal distribution. The An- nals of Statistics9(6), 1135–1151 (1981),http://www.jstor.org/stable/2240405

work page arXiv 1981

[58] [58]

In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N

Tan, X., Wang, H., Geng, X., Zhou, P.: Sopo: Text-to-motion generation us- ing semi-online preference optimization. In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N. (eds.) Advances in Neural Information Processing Systems. vol. 38, pp. 60714–60747. Curran Associates, Inc. (2025),https://proceedings.neurips.cc/paper_fi...

2025

[59] [59]

arXiv preprint arXiv:2501.16778 (2025)

Tashakori, A., Tashakori, A., Yang, G., Wang, Z.J., Servati, P.: Flexmotion: Lightweight, physics-aware, and controllable human motion generation. arXiv preprint arXiv:2501.16778 (2025)

work page arXiv 2025

[60] [60]

Hui et al

Tevet, G., Raab, S., Cohan, S., Reda, D., Luo, Z., Peng, X.B., Bermano, A.H., van de Panne, M.: CLoSD: Closing the loop between simulation and diffu- 20 X. Hui et al. sion for multi-task character control. In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv

2025

[61] [61]

In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

2023

[62] [62]

The Journal of Machine Learning Research11, 3137– 3181 (2010)

Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research11, 3137– 3181 (2010)

2010

[63] [63]

In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems

Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033 (2012).https://doi.org/10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012

[64] [64]

In: Proceedings of the AAAI conference on artificial intelligence

Wang, Z., Yu, P., Zhao, Y., Zhang, R., Zhou, Y., Yuan, J., Chen, C.: Learning diverse stochastic human-action generators by learning smooth latent transitions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 12281– 12288 (2020)

2020

[65] [65]

arXiv preprint arXiv:2509.20927 (2025)

Watanabe, A., Ren, J., Siyao, L., Peng, Y., Wu, E., Simo-Serra, E.: Simdiff: Simulator-constrained diffusion model for physically plausible motion generation. arXiv preprint arXiv:2509.20927 (2025)

work page arXiv 2025

[66] [66]

arXiv preprint arXiv:2512.23464 (2025)

Wen, Y., Shuai, Q., Kang, D., Li, J., Wen, C., Qian, Y., Jiao, N., Chen, C., Chen, W., Wang, Y., et al.: Hy-motion 1.0: Scaling flow matching models for text-to- motion generation. arXiv preprint arXiv:2512.23464 (2025)

work page arXiv 2025

[67] [67]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Weng, W., Tan, X., Wang, J., Xie, G.S., Zhou, P., Wang, H.: Realign: text-to- motion generation via step-aware reward-guided alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 10621–10629 (2026)

2026

[68] [68]

Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

Williams, D.: Probability with Martingales. Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

1991

[69] [69]

Machine learning8(3), 229–256 (1992)

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning8(3), 229–256 (1992)

1992

[70] [70]

In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: Omnicontrol: Control any joint at any time for human motion generation. In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

2024

[71] [71]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Xu, L., Qu, H., Cai, Y., Liu, J.: 6d-diff: A keypoint diffusion framework for 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9676–9686 (2024)

2024

[72] [72]

ACM Comput

Yang,L.,Zhang,Z.,Song,Y.,Hong,S.,Xu,R.,Zhao,Y.,Zhang,W.,Cui,B.,Yang, M.H.: Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv.56(4) (Nov 2023).https://doi.org/10.1145/3626235, https://doi.org/10.1145/3626235

work page doi:10.1145/3626235 2023

[73] [73]

IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

Yin, T., Hoyet, L., Christie, M., Cani, M.P., Pettré, J.: The one-man-crowd: Single user generation of crowd motions using virtual reality. IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

work page doi:10.1109/tvcg.2022.3150507 2022

[74] [74]

Yong,J.,Zhou,X.Y.:Stochasticcontrols:HamiltoniansystemsandHJBequations, vol. 43. Springer Science & Business Media (1999)

1999

[75] [75]

arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

Yu, H., Liu, J., Gui, X., Wong, M., Hou, Y., Ong, Y.S.: A plug-and-play multi-criteria guidance for diverse in-betweening human motion generation. arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

work page arXiv 2025

[76] [76]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided hu- man motion diffusion model. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 16010–16021 (2023)

2023

[77] [77]

In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition

Zhang, J., Zhang, Y., Cun, X., Zhang, Y., Zhao, H., Lu, H., Shen, X., Shan, Y.: Generating human motion from textual descriptions with discrete representa- tions. In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition. pp. 14730–14740 (2023)

2023

[78] [78]

IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

2024

[79] [79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, R., Li, C., Lou, Y., Shi, Y., Wang, H., Huang, Y.: Steering where to dif- fuse: Generative modeling of phenotypic response simulation with steered diffusion bridge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 27367–27377 (June 2026)

2026

[80] [80]

In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

Zhang, S., Zhang, W., Gu, Q.: Energy-weighted flow matching for offline reinforce- ment learning. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

2025