Proposes World-Ego Modeling with WEM using CP-MoE diffusion and a new HTEWorld benchmark, claiming SOTA on hybrid navigation-manipulation tasks.
Live: Long-horizon interactive video world modeling
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Delta Forcing improves temporal coherence in interactive autoregressive video generation by estimating transition consistency from teacher-generator latent deltas and balancing it against a monotonic continuity objective.
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
citing papers explorer
-
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
Proposes World-Ego Modeling with WEM using CP-MoE diffusion and a new HTEWorld benchmark, claiming SOTA on hybrid navigation-manipulation tasks.
-
Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation
Delta Forcing improves temporal coherence in interactive autoregressive video generation by estimating transition consistency from teacher-generator latent deltas and balancing it against a monotonic continuity objective.
-
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.