Snapmogen: Human motion generation from expressive texts

SnapMoGen: Human Motion Generation from Expressive Texts , author= · 2025 · arXiv 2507.09122

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

MoGeFlow: Flowing Through Motion Codebook Geometry for Text-to-Motion Generation

cs.GR · 2026-06-10 · unverdicted · novelty 6.0

MoGeFlow learns text-conditioned flows over PartVQ group-specific code embeddings to generate motions, achieving SOTA R-Precision on HumanML3D and KIT-ML while preserving discrete token validity.

Multi-scale Coarse-to-fine Modeling for Test-time Human Motion Control

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

MSCoT uses multi-scale hierarchical token prediction, multi-scale guidance, and a token refiner to deliver SOTA text-to-motion control with 48% FID gain, 61% lower error, and 10x faster inference on HumanML3D.

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

cs.CV · 2026-02-12 · unverdicted · novelty 6.0

LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.

InterCMDM: Block-Causal Diffusion for Autoregressive Human Interaction Generation

cs.CV · 2026-07-02 · unverdicted · novelty 5.0

InterCMDM proposes a block-causal latent diffusion framework with dual-stream causal transformers and multi-task attention masks for autoregressive text-conditioned two-person interaction generation and reports SOTA results on InterHuman and Inter-X.

OMG: Omni-Modal Motion Generation for Generalist Humanoid Control

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

OMG is a diffusion model for omni-modal whole-body humanoid motion generation that uses language, audio, and reference motions after large-scale data curation to achieve state-of-the-art performance and adaptation.

Next-Scale Autoregressive Models for Text-to-Motion Generation

cs.CV · 2026-04-04

citing papers explorer

Showing 7 of 7 citing papers.

MoGeFlow: Flowing Through Motion Codebook Geometry for Text-to-Motion Generation cs.GR · 2026-06-10 · unverdicted · none · ref 26
MoGeFlow learns text-conditioned flows over PartVQ group-specific code embeddings to generate motions, achieving SOTA R-Precision on HumanML3D and KIT-ML while preserving discrete token validity.
Multi-scale Coarse-to-fine Modeling for Test-time Human Motion Control cs.CV · 2026-05-14 · unverdicted · none · ref 13
MSCoT uses multi-scale hierarchical token prediction, multi-scale guidance, and a token refiner to deliver SOTA text-to-motion control with 48% FID gain, 61% lower error, and 10x faster inference on HumanML3D.
Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs cs.CV · 2026-04-23 · unverdicted · none · ref 23
IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.
LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens cs.CV · 2026-02-12 · unverdicted · none · ref 19
LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.
InterCMDM: Block-Causal Diffusion for Autoregressive Human Interaction Generation cs.CV · 2026-07-02 · unverdicted · none · ref 111
InterCMDM proposes a block-causal latent diffusion framework with dual-stream causal transformers and multi-task attention masks for autoregressive text-conditioned two-person interaction generation and reports SOTA results on InterHuman and Inter-X.
OMG: Omni-Modal Motion Generation for Generalist Humanoid Control cs.RO · 2026-06-09 · unverdicted · none · ref 32
OMG is a diffusion model for omni-modal whole-body humanoid motion generation that uses language, audio, and reference motions after large-scale data curation to achieve state-of-the-art performance and adaptation.
Next-Scale Autoregressive Models for Text-to-Motion Generation cs.CV · 2026-04-04 · unreviewed · ref 13

Snapmogen: Human motion generation from expressive texts

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer