hub Canonical reference

Matrix-game 2.0: An open-source real-time and streaming interactive world model

· 2025 · cs.CV · arXiv 2508.13009

Canonical reference. 91% of citing Pith papers cite this work as background.

58 Pith papers citing it

Background 91% of classified citations

open full Pith review browse 58 citing papers arXiv PDF

abstract

Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1

citation-polarity summary

background 10 baseline 1

representative citing papers

Multiplayer Interactive World Models with Representation Autoencoders

cs.CV · 2026-07-06 · accept · novelty 7.0

A 5B-parameter latent diffusion model generates real-time four-player Rocket League matches conditioned on all players' actions, staying stable far beyond its training horizon.

WorldRoamBench: An Open-World Benchmark for Long-Horizon Stability of Interactive World Models

cs.CV · 2026-06-30 · accept · novelty 7.0 · 2 refs

A 600+ case open-world benchmark finds no interactive world model is simultaneously action-faithful, visually stable, physically plausible, and memory-consistent over long WASD interaction.

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

NeuWorld uses a transformer VAE to learn compact Neural Implicit Scenes from sparse posed frames and a diffusion transformer to evolve them conditioned on camera trajectories for consistent interactive exploration.

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

cs.CV · 2026-06-25 · conditional · novelty 7.0 · 4 refs

None of ten tested video-generation models reliably remembers objects after occlusion in dynamic scenes; static-camera videos inflate consistency scores.

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

SPAWN enables training-free insertion of custom visual concepts into autoregressive world models by swapping the pinned context-memory anchor over a short injection window.

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

cs.CV · 2026-05-18 · conditional · novelty 7.0

Per-frame natural-language action prompts enable simultaneous multi-entity control and cross-entity action transfer in interactive video world models, outperforming discrete action-index interfaces.

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0 · 2 refs

M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

WorldMark is the first public benchmark that standardizes scenes, trajectories, and control interfaces across heterogeneous interactive image-to-video world models.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

MoRight: Motion Control Done Right

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

Causal-rCM unifies teacher-forcing and self-forcing distillation for autoregressive video diffusion, delivering a 2-step model with VBench-T2V score 84.63 and enabling interactive world models on Cosmos 3 using only synthetic data.

SurgVista: Long-Horizon Surgical World Modeling with Plausible Instrument-Tissue Dynamics

cs.CV · 2026-06-18 · unverdicted · novelty 6.0

SurgVista mitigates spatial interaction incoherence and temporal fidelity collapse in surgical world models through trajectory-based contrastive regularization and drift-perturbed training, outperforming prior methods on a new long-horizon benchmark.

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

ActWorld extends navigation-centric world models to support mid-rollout object interactions via chunk-autoregressive generation, action-aware memory routing, and a persistent memory bank, backed by a 100K annotated interaction dataset.

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

PermaVid disentangles spatial context into semantic appearance and geometric structure via multi-modal memory banks and edit-aware updates to maintain long-term consistency in video generation after edits.

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

WEAVER is a multi-view world model using flow-matching that jointly satisfies fidelity, consistency, and efficiency for robotic manipulation, yielding 0.87 correlation with real success and policy gains on hardware.

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

MoVerse generates real-time interactive video world models from single narrow-FOV images via panoramic diffusion expansion, Gaussian scaffold lifting, and distillation of a bidirectional diffusion teacher into a causal autoregressive renderer.

Prisma-World: Camera-Controllable Multi-Agent Video World Model

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.

DisCo: World Models with Discrete Camera Motion Control

cs.CV · 2026-06-06 · unverdicted · novelty 6.0

DisCo uses discrete action primitives for camera control in video world models to achieve more reliable action following than continuous trajectories.

Streaming Video Generation with Streaming Force Control

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.

WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

WorldFly integrates a world model into a VLA framework via dual-branch coupled flow matching to jointly generate future videos and actions, outperforming baselines on an urban canyon traversal benchmark especially in unseen environments.

citing papers explorer

Showing 50 of 58 citing papers.

Multiplayer Interactive World Models with Representation Autoencoders cs.CV · 2026-07-06 · accept · none · ref 218 · internal anchor
A 5B-parameter latent diffusion model generates real-time four-player Rocket League matches conditioned on all players' actions, staying stable far beyond its training horizon.
WorldRoamBench: An Open-World Benchmark for Long-Horizon Stability of Interactive World Models cs.CV · 2026-06-30 · accept · none · ref 9 · 2 links · internal anchor
A 600+ case open-world benchmark finds no interactive world model is simultaneously action-faithful, visually stable, physically plausible, and memory-consistent over long WASD interaction.
Walking in the Implicit: Interactive World Exploration via Neural Scene Representation cs.CV · 2026-06-29 · unverdicted · none · ref 53 · internal anchor
NeuWorld uses a transformer VAE to learn compact Neural Implicit Scenes from sparse posed frames and a diffusion transformer to evolve them conditioned on camera trajectories for consistent interactive exploration.
MemoBench: Benchmarking World Modeling in Dynamically Changing Environments cs.CV · 2026-06-25 · conditional · none · ref 17 · 4 links · internal anchor
None of ten tested video-generation models reliably remembers objects after occlusion in dynamic scenes; static-camera videos inflate consistency scores.
From Zero to Hero: Training-Free Custom Concept Spawning in World Models cs.CV · 2026-06-01 · unverdicted · none · ref 8 · internal anchor
SPAWN enables training-free insertion of custom visual concepts into autoregressive world models by swapping the pinned context-memory anchor over a short injection window.
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models cs.CV · 2026-05-30 · unverdicted · none · ref 26 · internal anchor
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation cs.CV · 2026-05-25 · unverdicted · none · ref 9 · internal anchor
WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models cs.CV · 2026-05-18 · conditional · none · ref 20 · internal anchor
Per-frame natural-language action prompts enable simultaneous multi-entity control and cross-entity action transfer in interactive video world models, outperforming discrete action-index interfaces.
Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models cs.CV · 2026-05-03 · unverdicted · none · ref 12 · 2 links · internal anchor
M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.
WorldMark: A Unified Benchmark Suite for Interactive Video World Models cs.CV · 2026-04-23 · unverdicted · none · ref 15 · internal anchor
WorldMark is the first public benchmark that standardizes scenes, trajectories, and control interfaces across heterogeneous interactive image-to-video world models.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 290 · internal anchor
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
MoRight: Motion Control Done Right cs.CV · 2026-04-08 · unverdicted · none · ref 32 · internal anchor
MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer cs.CV · 2025-11-28 · unverdicted · none · ref 12 · internal anchor
One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 9 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models cs.CV · 2026-06-24 · unverdicted · none · ref 19 · internal anchor
Causal-rCM unifies teacher-forcing and self-forcing distillation for autoregressive video diffusion, delivering a 2-step model with VBench-T2V score 84.63 and enabling interactive world models on Cosmos 3 using only synthetic data.
SurgVista: Long-Horizon Surgical World Modeling with Plausible Instrument-Tissue Dynamics cs.CV · 2026-06-18 · unverdicted · none · ref 16 · internal anchor
SurgVista mitigates spatial interaction incoherence and temporal fidelity collapse in surgical world models through trajectory-based contrastive regularization and drift-perturbed training, outperforming prior methods on a new long-horizon benchmark.
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory cs.CV · 2026-06-16 · unverdicted · none · ref 6 · internal anchor
ActWorld extends navigation-centric world models to support mid-rollout object interactions via chunk-autoregressive generation, action-aware memory routing, and a persistent memory bank, backed by a 100K annotated interaction dataset.
PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory cs.CV · 2026-06-15 · unverdicted · none · ref 43 · internal anchor
PermaVid disentangles spatial context into semantic appearance and geometric structure via multi-modal memory banks and edit-aware updates to maintain long-term consistency in video generation after edits.
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation cs.RO · 2026-06-11 · unverdicted · none · ref 18 · internal anchor
WEAVER is a multi-view world model using flow-matching that jointly satisfies fidelity, consistency, and efficiency for robotic manipulation, yielding 0.87 correlation with real success and policy gains on hardware.
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold cs.CV · 2026-06-11 · unverdicted · none · ref 13 · internal anchor
MoVerse generates real-time interactive video world models from single narrow-FOV images via panoramic diffusion expansion, Gaussian scaffold lifting, and distillation of a bidirectional diffusion teacher into a causal autoregressive renderer.
Prisma-World: Camera-Controllable Multi-Agent Video World Model cs.CV · 2026-06-08 · unverdicted · none · ref 5 · internal anchor
Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.
DisCo: World Models with Discrete Camera Motion Control cs.CV · 2026-06-06 · unverdicted · none · ref 15 · internal anchor
DisCo uses discrete action primitives for camera control in video world models to achieve more reliable action following than continuous trajectories.
Streaming Video Generation with Streaming Force Control cs.CV · 2026-06-05 · unverdicted · none · ref 20 · internal anchor
StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.
WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation cs.AI · 2026-06-04 · unverdicted · none · ref 6 · internal anchor
WorldFly integrates a world model into a VLA framework via dual-branch coupled flow matching to jointly generate future videos and actions, outperforming baselines on an urban canyon traversal benchmark especially in unseen environments.
Geometry-Aware Implicit Memory for Video World Models cs.CV · 2026-06-01 · unverdicted · none · ref 21 · internal anchor
GIM-World adds a camera-queryable geometry distillation head and pruning rule to implicit memory in video world models, claiming better long-horizon geometric consistency on the MIND benchmark than explicit and implicit baselines.
Robust Dreamer: Deviation-Aware Latent Gaussian Memory for Action-Controlled AR Video Generation cs.CV · 2026-05-29 · unverdicted · none · ref 21 · internal anchor
Robust Dreamer uses Latent Gaussian Memory anchored to diffusion latents and Deviation Learning with a Dynamic Deviation Archive to reduce drift in long-horizon action-controlled image-to-video generation, reporting SOTA results on ScanNet, DL3DV, and OmniWorldGame.
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models cs.CV · 2026-05-28 · unverdicted · none · ref 18 · internal anchor
minWM supplies an end-to-end pipeline that fine-tunes bidirectional T2V/TI2V models with camera control then distills them via Causal Forcing into few-step autoregressive generators for low-latency rollout.
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players cs.CV · 2026-05-27 · unverdicted · none · ref 19 · internal anchor
A multi-agent video world model using simplex rotary agent encoding and sparse hub attention achieves better fidelity, controllability, and consistency than baselines while generalizing from 2 to 4 players.
DexSIM: Real-time Dexterous Simulation with Unified Causal Video Diffusion cs.CV · 2026-05-23 · unverdicted · none · ref 3 · internal anchor
DexSIM is a bi-directional video diffusion model with hand trajectory embedding and spatial memory cache for real-time dexterous hand-object simulation at 15 FPS.
WorldKV: Efficient World Memory with World Retrieval and Compression cs.CV · 2026-05-21 · unverdicted · none · ref 6 · internal anchor
WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks cs.CV · 2026-05-19 · unverdicted · none · ref 50 · internal anchor
Proposes World-Ego Modeling with WEM using CP-MoE diffusion and a new HTEWorld benchmark, claiming SOTA on hybrid navigation-manipulation tasks.
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 7 · internal anchor
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation cs.CV · 2026-04-20 · unverdicted · none · ref 17 · internal anchor
A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
Lyra 2.0: Explorable Generative 3D Worlds cs.CV · 2026-04-14 · unverdicted · none · ref 28 · internal anchor
Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling cs.CV · 2026-04-08 · unverdicted · none · ref 32 · internal anchor
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
UNICA: A Unified Neural Framework for Controllable 3D Avatars cs.CV · 2026-04-03 · unverdicted · none · ref 21 · internal anchor
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization cs.LG · 2026-02-03 · unverdicted · none · ref 3 · internal anchor
Quant VideoGen reduces KV cache memory by up to 7 times in autoregressive video diffusion models via semantic aware smoothing and progressive residual quantization, achieving better quality than baselines with under 4% latency overhead.
AstraNav-World: World Model for Foresight Control and Consistency cs.CV · 2025-12-25 · unverdicted · none · ref 10 · internal anchor
AstraNav-World unifies diffusion video generation and vision-language action planning in a single bidirectional model that improves trajectory accuracy, success rates, and zero-shot real-world adaptation in embodied navigation.
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling cs.CV · 2025-12-16 · conditional · none · ref 17 · internal anchor
A real-time video diffusion world model that uses dual action control, reframed position encodings, and context-aligned distillation to keep generated environments consistent over hundreds of frames.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 15 · internal anchor
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
Infinite Worlds with Versatile Interactions cs.CV · 2026-07-08 · conditional · none · ref 22 · internal anchor
An open-source causal video world model sustains hour-long, 720p/60fps interactive generation without visual drift, paired with a VLM-based director-pilot agentic harness for rich, open-ended interaction.
WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory cs.CV · 2026-07-02 · unverdicted · none · ref 20 · internal anchor
A video world model framework that uses LLM-orchestrated 3D trajectories as control signals for generation to achieve persistent dynamic object memory and viewpoint freedom.
WorldOlympiad: Can Your World Model Survive a Triathlon? cs.CV · 2026-06-09 · unverdicted · none · ref 11 · internal anchor
WorldOlympiad is a new benchmark decomposing world-model evaluation into physical, geometry, and interaction tracks using segmentation, MLLM judges, Gaussian splatting, and action prompts on diverse scenarios.
DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory cs.CV · 2026-05-29 · unverdicted · none · ref 11 · internal anchor
DecMem proposes a decoupled memory system using sparse global and anchored local components to enable consistent minute-long controllable video generation in world models.
WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models cs.CV · 2026-05-24 · unverdicted · none · ref 5 · internal anchor
WorldCraft introduces NWT, SP-LoRA, and TASP to enable object trajectory control in video-based world models while preserving camera navigation.
OrbiSim: World Models as Differentiable Physics Engines for Embodied Intelligence cs.RO · 2026-05-12 · unverdicted · none · ref 17 · internal anchor
OrbiSim builds a differentiable physics engine from world models to support gradient-based policy optimization and contact modeling in robotics.
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse cs.CV · 2026-05-11 · unverdicted · none · ref 67 · 2 links · internal anchor
The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
Neural Computers cs.LG · 2026-04-07 · unverdicted · none · ref 15 · internal anchor
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · conditional · none · ref 42 · internal anchor
In twisted bilayer nodal d-wave superconductors, interlayer hopping creates nodes on the C2 axis and Bogoliubov flat bands when the single-layer Berry connection is parallel to that axis.
Cloning Deterministic Worlds: The Critical Role of Latent Geometry in Long-Horizon World Models cs.LG · 2025-10-30 · unverdicted · none · ref 9 · internal anchor
GRWM uses temporal contrastive learning to geometrically regularize latent spaces in world models for high-fidelity cloning of deterministic 3D worlds.

Matrix-game 2.0: An open-source real-time and streaming interactive world model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer