citation dossier

Advancing open-source world models.arXiv preprint arXiv:2601.20540

R · 2026 · arXiv 2601.20540

16Pith papers citing it

16reference links

cs.CVtop field · 10 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.CV (10 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

cs.CV · 2026-05-12 · conditional · novelty 7.0

HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Interpolating exo and ego videos into a single continuous sequence lets diffusion sequence models generate more coherent first-person videos than direct conditioning, even without pose interpolation.

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.

UNICA: A Unified Neural Framework for Controllable 3D Avatars

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models

cs.RO · 2026-05-08 · unverdicted · novelty 5.0

Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

cs.GR · 2026-05-05 · unverdicted · novelty 4.0

JoyAI-Image unifies visual understanding, generation, and editing in one model and claims stronger spatial intelligence through bidirectional perception-generation loops.

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

cs.CV · 2026-04-15 · unverdicted · novelty 4.0

HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

cs.CV · 2026-04-06 · unverdicted · novelty 4.0

OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.

citing papers explorer

Showing 16 of 16 citing papers.

HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation cs.CV · 2026-05-12 · conditional · none · ref 21
HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 8
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
MultiWorld: Scalable Multi-Agent Multi-View Video World Models cs.CV · 2026-04-20 · unverdicted · none · ref 41
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control cs.RO · 2026-04-30 · unverdicted · none · ref 22
ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception cs.CV · 2026-04-19 · unverdicted · none · ref 29
Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 170
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation cs.CV · 2026-04-15 · unverdicted · none · ref 44
Interpolating exo and ego videos into a single continuous sequence lets diffusion sequence models generate more coherent first-person videos than direct conditioning, even without pose interpolation.
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling cs.CV · 2026-04-08 · unverdicted · none · ref 78
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing cs.CV · 2026-04-06 · unverdicted · none · ref 47
SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.
UNICA: A Unified Neural Framework for Controllable 3D Avatars cs.CV · 2026-04-03 · unverdicted · none · ref 17
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 235
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 33
Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation cs.GR · 2026-05-05 · unverdicted · none · ref 74
JoyAI-Image unifies visual understanding, generation, and editing in one model and claims stronger spatial intelligence through bidirectional perception-generation loops.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds cs.CV · 2026-04-15 · unverdicted · none · ref 61
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 38
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unverdicted · none · ref 116
OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.

Advancing open-source world models.arXiv preprint arXiv:2601.20540

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer