OMG: Omni-Modal Motion Generation for Generalist Humanoid Control
Pith reviewed 2026-06-27 13:12 UTC · model grok-4.3
The pith
A diffusion model conditions on language, audio and reference motions to drive generalist whole-body humanoid control from curated data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OMG consists of a meticulous data curation, filtering and labeling pipeline plus a diffusion-based motion generation backbone that conditions on language, audio and human reference motions. The architecture places this generator as a reasoning brain above a reactive cerebellum. Experiments demonstrate that the resulting controller achieves state-of-the-art whole-body performance, exhibits model scaling, and adapts efficiently to new distributions and modalities.
What carries the argument
Diffusion-based motion generation backbone conditioned on language, audio and human reference motions, supported by a data curation pipeline.
If this is right
- The controller reaches state-of-the-art performance on whole-body motion tasks.
- Performance improves with larger model size according to scaling laws.
- The same model adapts quickly to new data distributions and input modalities.
- The approach constitutes a step toward foundation models for humanoid robots.
Where Pith is reading between the lines
- The same hierarchical separation of reasoning and tracking layers could be tested on non-humanoid robots that require multi-modal commands.
- The data curation steps may transfer to other motion-generation domains where high-quality paired examples are scarce.
- If scaling continues, the model could eventually accept longer-horizon language instructions without additional reward engineering.
Load-bearing premise
A scalable multi-modal reasoning module placed on a reactive motion tracker is enough to reach general-purpose humanoid control.
What would settle it
A controlled test in which the model shows no performance gain when scaled in size or fails to adapt when a new input modality is added after fine-tuning.
Figures
read the original abstract
Humanoid whole-body control has made significant progress in recent years, yet existing approaches remain limited to few-skill policies with heavy reward engineering, or motion trackers that are difficult to extend to new input modalities. We argue that the key to general-purpose humanoid control is to build a scalable brain, a module capable of reasoning with diverse conditioning modalities, atop a reactive motion tracking cerebellum, mirroring the hierarchical structure of biological motor systems. Two challenges arise in realizing this vision: acquiring a vast amount of high-quality data to achieve general purpose control, and equipping the generator with the capability to condition on compositional, extensible multi-modal inputs. We present OMG, which addresses these challenges with a meticulous data curation, filtering and labeling pipeline, as well as a diffusion-based motion generation backbone that conditions on language, audio, and human reference motions. Extensive experiments validate OMG as an omni-modal whole-body controller exhibiting state-of-the-art performance, model scaling behavior and efficient adaptation to new distributions and modalities, marking a concrete step toward foundation models for humanoid robots.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OMG, a diffusion-based omni-modal motion generator for humanoid whole-body control. It builds a scalable multi-modal reasoning module (brain) atop a reactive motion tracker (cerebellum), enabled by a data curation, filtering, and labeling pipeline that supports conditioning on language, audio, and human reference motions. The authors claim that extensive experiments demonstrate state-of-the-art performance, model scaling behavior, and efficient adaptation to new distributions and modalities, advancing toward foundation models for humanoid robots.
Significance. If the experimental results hold, the work would constitute a meaningful step toward generalist humanoid controllers by addressing the extensibility limitations of few-skill policies and non-extendable trackers through hierarchical multi-modal design and curated data pipelines.
major comments (1)
- [Abstract] Abstract: The central claims of state-of-the-art performance, model scaling behavior, and efficient adaptation to new modalities rest entirely on unspecified experiments; no quantitative metrics, baseline comparisons, dataset sizes, error bars, or ablation results are provided to support these assertions, which are load-bearing for the paper's contribution.
Simulated Author's Rebuttal
We thank the referee for their review and the opportunity to clarify the presentation of our results. The primary concern raised is addressed point-by-point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of state-of-the-art performance, model scaling behavior, and efficient adaptation to new modalities rest entirely on unspecified experiments; no quantitative metrics, baseline comparisons, dataset sizes, error bars, or ablation results are provided to support these assertions, which are load-bearing for the paper's contribution.
Authors: We agree that the abstract, as currently written, is a high-level summary that does not enumerate specific quantitative results. The full manuscript contains these details in the Experiments section, including direct comparisons against baselines, dataset statistics from the curation pipeline, statistical error bars across runs, and ablations isolating the contributions of the multi-modal conditioning and data pipeline. To make the abstract self-contained and better support the load-bearing claims, we will revise it to include concise references to key quantitative outcomes (e.g., performance deltas and data scale) while preserving its brevity. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical system (OMG) consisting of a data curation pipeline and a diffusion-based multi-modal motion generator, with performance claims supported by experiments on SOTA results, scaling, and adaptation. No derivation chain, equations, or first-principles reductions appear in the provided abstract or description. Claims are framed as outcomes of training and evaluation rather than predictions forced by fitted parameters or self-citations. The hierarchical brain-cerebellum motif is presented as a design choice mirroring biology, not as a mathematically derived necessity. No load-bearing steps reduce to self-definition, renaming, or imported uniqueness theorems. The work is self-contained as an engineering contribution validated externally via benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Chen, K. Wang, B. Zhang, X. Ma, Z. Yang, Y . Ren, Q. Huang, Z. Zhu, Y . Wang, and Z. Su. Holomotion-1 technical report, 2026. URLhttps://arxiv.org/abs/2605.15336
Pith/arXiv arXiv 2026
- [2]
-
[3]
I. Radosavovic, S. Kamat, T. Darrell, and J. Malik. Learning humanoid locomotion over chal- lenging terrain.arXiv preprint arXiv:2410.03654, 2024
arXiv 2024
- [4]
-
[5]
Y . Li, Y . Zhang, W. Xiao, C. Pan, H. Weng, G. He, T. He, and G. Shi. Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control.arXiv preprint arXiv:2505.24198, 2025
arXiv 2025
-
[6]
Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
Pith/arXiv arXiv 2025
- [7]
-
[8]
Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
Pith/arXiv arXiv 2025
-
[9]
Serifi, R
A. Serifi, R. Grandia, E. Knoop, M. Gross, and M. Bächer. Robot motion diffusion model: Motion generation for robotic characters. InSIGGRAPH asia 2024 conference papers, pages 1–9, 2024
2024
-
[10]
Tevet, S
G. Tevet, S. Raab, S. Cohan, D. Reda, Z. Luo, X. B. Peng, A. Bermano, and M. Van de Panne. Closd: Closing the loop between simulation and diffusion for multi-task character control. InInternational Conference on Learning Representations, volume 2025, pages 46506–46520, 2025
2025
-
[11]
M. Xu, Y . Shi, K. Yin, and X. B. Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–11, 2025
2025
-
[12]
Z. Zhang, K. Wen, M. Xu, J. He, C. Li, T. Miki, C. Schwarke, C. Zhang, X. B. Peng, and M. Hutter. Learning whole-body humanoid locomotion via motion generation and motion tracking.arXiv preprint arXiv:2604.17335, 2026
Pith/arXiv arXiv 2026
-
[13]
W. Xie, J. Zheng, J. Han, J. Shi, W. Zhang, C. Bai, and X. Li. Textop: Real-time interactive text-driven humanoid robot motion generation and control.arXiv preprint arXiv:2602.07439, 2026
arXiv 2026
-
[14]
Schuhmann, R
C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35: 25278–25294, 2022. 9
2022
-
[15]
Penedo, H
G. Penedo, H. Kydlí ˇcek, A. Lozhkov, M. Mitchell, C. Raffel, L. V on Werra, T. Wolf, et al. The fineweb datasets: Decanting the web for the finest text data at scale.Advances in Neural Information Processing Systems, 37:30811–30849, 2024
2024
-
[16]
T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi. Learning human-to-humanoid real-time whole-body teleoperation.arXiv preprint arXiv:2403.04436, 2024
arXiv 2024
-
[17]
T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning.arXiv preprint arXiv:2406.08858, 2024
arXiv 2024
-
[18]
Y . Ze, Z. Chen, J. P. Araújo, Z. ang Cao, X. B. Peng, J. Wu, and C. K. Liu. Twist: Teleoperated whole-body imitation system.arXiv preprint arXiv:2505.02833, 2025
arXiv 2025
-
[19]
G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-Or, and A. H. Bermano. Human motion diffusion model.arXiv preprint arXiv:2209.14916, 2022
Pith/arXiv arXiv 2022
- [20]
-
[21]
L. Yang, Z. Zhang, Y . Song, S. Hong, R. Xu, Y . Zhao, W. Zhang, B. Cui, and M.-H. Yang. Diffusion models: A comprehensive survey of methods and applications.ACM computing surveys, 56(4):1–39, 2023
2023
-
[22]
Peebles and S
W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
2023
-
[23]
H. Liu, Z. Zhu, G. Becherini, Y . Peng, M. Su, Y . Zhou, X. Zhe, N. Iwamoto, B. Zheng, and M. J. Black. Emage: Towards unified holistic co-speech gesture generation via expressive masked audio gesture modeling, 2024. URLhttps://arxiv.org/abs/2401.00374
arXiv 2024
- [24]
-
[25]
J. Li, J. Cao, H. Zhang, D. Rempe, J. Kautz, U. Iqbal, and Y . Yuan. Genmo: A generalist model for human motion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
2025
-
[26]
D. Rempe, M. Petrovich, Y . Yuan, H. Zhang, X. B. Peng, Y . Jiang, T. Wang, U. Iqbal, D. Minor, M. de Ruyter, J. Li, C. Tessler, E. Lim, E. Jeong, S. Wu, E. Hassani, M. Huang, J.-B. Yu, C. Chung, L. Song, O. Dionne, J. Kautz, S. Yuen, and S. Fidler. Kimodo: Scaling controllable human motion generation.arXiv:2603.15546, 2026
arXiv 2026
-
[27]
K. Fan, S. Lu, M. Dai, R. Yu, L. Xiao, Z. Dou, J. Dong, L. Ma, and J. Wang. Go to zero: Towards zero-shot motion generation with million-scale data, 2025. URLhttps://arxiv. org/abs/2507.07095
arXiv 2025
-
[28]
Mahmood, N
N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019
2019
-
[29]
F. G. Harvey, M. Yurick, D. Nowrouzezahrai, and C. Pal. Robust motion in-betweening. arXiv:2102.04942, 39(4), 2020
arXiv 2020
-
[30]
J. Li, J. Wu, and C. K. Liu. Object motion guided human motion synthesis.ACM Trans. Graph., 42(6), 2023. 10
2023
-
[31]
Mason, S
I. Mason, S. Starke, and T. Komura. Real-time style modelling of human locomotion via feature-wise transformations and local motion phases.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 5(1):1–18, 2022
2022
-
[32]
C. Guo, I. Hwang, J. Wang, and B. Zhou. Snapmogen: Human motion generation from ex- pressive texts, 2025. URLhttps://arxiv.org/abs/2507.09122
arXiv 2025
-
[33]
R. Li, J. Zhao, Y . Zhang, M. Su, Z. Ren, H. Zhang, Y . Tang, and X. Li. Finedance: A fine-grained choreography dataset for 3d full body dance generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10234–10243, 2023
2023
-
[34]
R. Li, S. Yang, D. A. Ross, and A. Kanazawa. Learn to dance with aist++: Music conditioned 3d dance generation, 2021
2021
- [35]
-
[36]
J. Lin, A. Zeng, S. Lu, Y . Cai, R. Zhang, H. Wang, and L. Zhang. Motion-x: A large-scale 3d expressive whole-body human motion dataset.Advances in Neural Information Processing Systems, 2023
2023
-
[37]
B. Kim, H. I. Jeong, J. Sung, Y . Cheng, J. Lee, J. Y . Chang, S.-I. Choi, Y . Choi, S. Shin, J. Kim, and H. J. Chang. Personabooth: Personalized text-to-motion generation.arXiv preprint arXiv:2503.07390, 2025
arXiv 2025
-
[38]
K. Chen, Z. Tan, J. Lei, S.-H. Zhang, Y .-C. Guo, W. Zhang, and S.-M. Hu. Choreomas- ter: choreography-oriented music-driven dance synthesis.ACM Trans. Graph., 40(4), July
-
[39]
ISSN 0730-0301. doi:10.1145/3450626.3459932. URLhttps://doi.org/10.1145/ 3450626.3459932
-
[40]
B. Burkanova, P. J. Yazdian, C. Zhang, T. Evans, P. Tuttösí, and A. Lim. Salsa as a nonverbal embodied language – the compas3d dataset and benchmarks, 2025. URLhttps://arxiv. org/abs/2507.19684
Pith/arXiv arXiv 2025
-
[41]
N. Le, T. Pham, T. Do, E. Tjiputra, Q. D. Tran, and A. Nguyen. Music-driven group choreog- raphy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2023
2023
-
[42]
Y . Ze, J. P. Araújo, J. Wu, and C. K. Liu. Gmr: General motion retargeting, 2025. URL https://github.com/YanjieZe/GMR. GitHub repository
2025
-
[43]
J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025
arXiv 2025
-
[44]
B. Seed. Seed1. 8 model card: Towards generalized real-world agency.arXiv preprint arXiv:2603.20633, 2026
Pith/arXiv arXiv 2026
-
[45]
Todorov, T
E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–
2012
- [46]
-
[47]
Perez, F
E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
2018
-
[48]
Y . Du, S. Yang, B. Dai, H. Dai, O. Nachum, J. Tenenbaum, D. Schuurmans, and P. Abbeel. Learning universal policies via text-guided video generation.Advances in neural information processing systems, 36:9156–9172, 2023. 11
2023
-
[49]
Y . Du, S. Yang, P. Florence, F. Xia, A. Wahid, P. Sermanet, T. Yu, P. Abbeel, J. B. Tenen- baum, L. Kaelbling, et al. Video language planning. InInternational Conference on Learning Representations, volume 2024, pages 31138–31155, 2024
2024
-
[50]
T. Li and K. He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025
Pith/arXiv arXiv 2025
-
[51]
Y . Lu, S. Lu, Q. Sun, H. Zhao, Z. Jiang, X. Wang, T. Li, Z. Geng, and K. He. One-step latent-free image generation with pixel mean flows.arXiv preprint arXiv:2601.22158, 2026
Pith/arXiv arXiv 2026
-
[52]
Raffel, N
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
2020
-
[53]
Zhang, Y
J. Zhang, Y . Zhang, X. Cun, Y . Zhang, H. Zhao, H. Lu, X. Shen, and Y . Shan. Generating human motion from textual descriptions with discrete representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14730–14740, 2023
2023
-
[54]
Tseng, R
J. Tseng, R. Castellon, and K. Liu. Edge: Editable dance generation from music. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 448–458, 2023
2023
-
[55]
Jiang, P
J. Jiang, P. Streli, H. Qiu, A. Fender, L. Laich, P. Snape, and C. Holz. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. InEuropean conference on computer vision, pages 443–460. Springer, 2022
2022
-
[56]
X. Yi, Y . Zhou, M. Habermann, S. Shimada, V . Golyanik, C. Theobalt, and F. Xu. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sen- sors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13167–13178, 2022
2022
- [57]
-
[58]
R. Nai, B. Zheng, J. Zhao, H. Zhu, S. Dai, Z. Chen, Y . Hu, Y . Hu, T. Zhang, C. Wen, et al. Hu- manoid manipulation interface: Humanoid whole-body manipulation from robot-free demon- strations.arXiv preprint arXiv:2602.06643, 2026
arXiv 2026
-
[59]
Y . Wen, Q. Shuai, D. Kang, J. Li, C. Wen, Y . Qian, N. Jiao, C. Chen, W. Chen, Y . Wang, et al. Hy-motion 1.0: Scaling flow matching models for text-to-motion generation.arXiv preprint arXiv:2512.23464, 2025
arXiv 2025
-
[60]
R. Li, Y . Zhang, Y . Zhang, H. Zhang, J. Guo, Y . Zhang, Y . Liu, and X. Li. Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1524–1534, 2024
2024
-
[61]
Siyao, W
L. Siyao, W. Yu, T. Gu, C. Lin, Q. Wang, C. Qian, C. C. Loy, and Z. Liu. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050–11059, 2022
2022
-
[62]
Q. Zhao, K. Yang, X. Wang, S. Zhao, Y . Lu, X. Zhang, Q. Shen, X.-X. Long, and X. Cao. Make tracking easy: Neural motion retargeting for humanoid whole-body control.arXiv preprint arXiv:2603.22201, 2026
Pith/arXiv arXiv 2026
-
[63]
Z. Luo, J. Cao, A. W. Winkler, K. Kitani, and W. Xu. Perpetual humanoid control for real-time simulated avatars. InInternational Conference on Computer Vision (ICCV), 2023. 12
2023
-
[64]
L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025
Pith/arXiv arXiv 2025
-
[65]
W. Zeng, S. Lu, K. Yin, X. Niu, M. Dai, J. Wang, and J. Pang. Behavior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025
arXiv 2025
-
[66]
M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, B. Li, W. Zhang, W. Zeng, et al. A sur- vey of behavior foundation model: Next-generation whole-body control system of humanoid robots.IEEE transactions on pattern analysis and machine intelligence, 2025
2025
-
[67]
A. Tirinzoni, A. Touati, J. Farebrother, M. Guzek, A. Kanervisto, Y . Xu, A. Lazaric, and M. Pirotta. Zero-shot whole-body humanoid control via behavioral foundation models.arXiv preprint arXiv:2504.11054, 2025
arXiv 2025
-
[68]
Y . Li, Z. Luo, T. Zhang, C. Dai, A. Kanervisto, A. Tirinzoni, H. Weng, K. Kitani, M. Guzek, A. Touati, et al. Bfm-zero: A promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025
arXiv 2025
-
[69]
Z. Tao, Z. Su, P. Liu, J. Sun, W. Que, J. Ma, J. Yu, J. Cao, P. Sun, H. Liang, et al. Hera- cles: Bridging precise tracking and generative synthesis for general humanoid control.arXiv preprint arXiv:2603.27756, 2026
arXiv 2026
-
[70]
Loper, N
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. SMPL: A skinned multi- person linear model.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 34(6):248:1– 248:16, Oct. 2015
2015
-
[71]
Pavlakos, V
G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. A. Osman, D. Tzionas, and M. J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019
2019
-
[72]
Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019
2019
-
[73]
J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
2024
-
[74]
J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020
Pith/arXiv arXiv 2010
-
[75]
F. Liu, S. Zhang, X. Wang, Y . Wei, H. Qiu, Y . Zhao, Y . Zhang, Q. Ye, and F. Wan. Timestep embedding tells: It’s time to cache for video diffusion model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7353–7363, 2025
2025
-
[76]
I. Loshchilov and F. Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 13 A Extended Related Work Behavior Foundation Models for Humanoid Robots.Going beyond isolated skills, recent works have started to explore systems that capture broad, reusable behavioral knowledge for humanoid robots, often referred to asBehavior Fo...
Pith/arXiv arXiv 2017
-
[77]
[Action Requirements]
action [Segmentation Criteria] You should decide whether to split mainly based on: - whether the motion style changes - whether a finer motion style / type can be judged to have changed - whether the main action changes If the motion remains continuous and no obvious change occurs, do not over-segment. [Action Requirements]
-
[78]
action must be written in English
-
[79]
action should summarize the concrete motion performed in this segment
-
[80]
action should describe the motion itself as much as possible, and should not write overly abstract content
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.