arXiv preprint arXiv:2506.24086 , year=

Zhu, B · 2025 · arXiv 2506.24086

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body

cs.CV · 2025-12-16 · unverdicted · novelty 7.0

ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.

SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control

cs.GR · 2026-05-21 · unverdicted · novelty 6.0

SCRIPT presents a scalable diffusion policy with JAST-DiT architecture, nonlinear history conditioning, and RLHR post-training that claims to outperform prior methods on text alignment, motion quality, and physical realism while scaling on a 1200-hour dataset.

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

cs.CV · 2026-02-12 · unverdicted · novelty 6.0

LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.

Exploring Motion-Language Alignment for Text-driven Motion Generation

cs.CV · 2026-04-03 · unverdicted · novelty 5.0

MLA-Gen advances text-driven motion synthesis by aligning global motion patterns with fine-grained text semantics and mitigating attention sink effects via new masking techniques.

Towards Continual Motion-Language Agents: LoRA Variants for Incremental Motion Understanding and Generation

cs.LG · 2026-06-29 · unverdicted · novelty 4.0

Proposes LoRA-based mixture-of-experts with autoencoder routing for continual bidirectional motion-language learning, reporting near-zero forgetting on a 5-task HumanML3D benchmark derived via semantic clustering.

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

cs.GR · 2026-05-14 · unverdicted · novelty 4.0

UMo presents a sparse MoE-based unified model for real-time co-speech avatar animation that claims superior quality under latency constraints via keyframe-centric design and multi-stage audio-augmented training.

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

cs.CV · 2026-04-23

citing papers explorer

Showing 1 of 1 citing paper after filters.

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs cs.CV · 2026-04-23 · unverdicted · none · ref 121
IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.

arXiv preprint arXiv:2506.24086 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer