Worldmem: Long-term consistent world simulation with memory

Zeqi Xiao, Yushi Lan, Yifan Zhou, Wenqi Ouyang, Shuai Yang, Yanhong Zeng, Xingang Pan · 2025 · arXiv 2504.12369

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and long-term consistency in multi-modal video generation.

CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

CityRAG generates minutes-long 3D-consistent videos of real-world cities by grounding outputs in geo-registered data and using temporally unaligned training to disentangle fixed scenes from transient elements like weather.

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.

Lyra 2.0: Explorable Generative 3D Worlds

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 9 of 9 citing papers.

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives cs.CV · 2026-05-12 · unverdicted · none · ref 49
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 66
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
MultiWorld: Scalable Multi-Agent Multi-View Video World Models cs.CV · 2026-04-20 · unverdicted · none · ref 56
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models cs.CV · 2026-05-03 · unverdicted · none · ref 48
M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and long-term consistency in multi-modal video generation.
CityRAG: Stepping Into a City via Spatially-Grounded Video Generation cs.CV · 2026-04-21 · unverdicted · none · ref 65
CityRAG generates minutes-long 3D-consistent videos of real-world cities by grounding outputs in geo-registered data and using temporally unaligned training to disentangle fixed scenes from transient elements like weather.
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation cs.CV · 2026-04-20 · unverdicted · none · ref 52
A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
Lyra 2.0: Explorable Generative 3D Worlds cs.CV · 2026-04-14 · unverdicted · none · ref 118
Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 45
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 282
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Worldmem: Long-term consistent world simulation with memory

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer