Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

· 2025 · cs.RO · arXiv 2511.22963

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Enabling humanoid robots to follow free-form natural language commands is a critical step toward seamless human-robot interaction and general-purpose embodied AI. However, existing methods remain limited, often constrained to simple instructions or forced to sacrifice motion diversity for physical plausibility. To address this gap, we present Humanoid-LLA, a Large Language Action model that translates unconstrained natural language directly into executable whole-body motions for humanoid robots. Our approach tackles two core challenges: paired language-humanoid motion data scarcity and physical instability. First, we bridge high-level language semantics with physically-grounded control by learning a unified human-humanoid motion vocabulary. Second, we introduce a novel two-stage fine-tuning framework that begins with supervised motion Chain-of-Thought learning, followed by reinforcement learning refined with physical feedback to ensure robustness and stability. Extensive evaluation in simulation and real-world cross-embodiment experiments demonstrates that Humanoid-LLA achieves superior generalization to novel language commands and diverse motion generation while maintaining high physical fidelity.

representative citing papers

Not All Relations Rotate Alike: Transformation-Aware Decoupling for Viewpoint-Robust 3D Scene Graph Generation

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

TAD decouples 3DSGG relation reasoning by predicate transformation behavior to achieve yaw-robust predictions without rotation augmentation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Not All Relations Rotate Alike: Transformation-Aware Decoupling for Viewpoint-Robust 3D Scene Graph Generation cs.CV · 2026-06-25 · unverdicted · none · ref 15 · internal anchor
TAD decouples 3DSGG relation reasoning by predicate transformation behavior to achieve yaw-robust predictions without rotation augmentation.

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

fields

years

verdicts

representative citing papers

citing papers explorer