hub Canonical reference

Genie: Generative interactive environments

Palitha Edirisingha · 2024 · Interactive Learning Environments · DOI 10.1080/10494820.2024.2414152

Canonical reference. 88% of citing Pith papers cite this work as background.

23 Pith papers citing it

3 external citations · external index

Background 88% of classified citations

open at publisher browse 23 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 7 dataset 1

citation-polarity summary

background 7 use dataset 1

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.

PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.

ReactiveGWM: Steering NPC in Reactive Game World Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

cs.RO · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.

OpenGame: Open Agentic Coding for Games

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

OpenGame is the first open-source agentic framework for end-to-end web game creation, using Game Skills and GameCoder-27B to achieve state-of-the-art results on 150 prompts via a new benchmark measuring build health, visual usability, and intent alignment.

Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation

cs.RO · 2026-03-16 · unverdicted · novelty 6.0

SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

cs.CV · 2025-10-02 · conditional · novelty 6.0

Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.

stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

Nautilus: From One Prompt to Plug-and-Play Robot Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.

ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.

Coding Agent Is Good As World Simulator

cs.AI · 2026-05-14

Do Joint Audio-Video Generation Models Understand Physics?

cs.SD · 2026-05-08

VRAG: Learning World Models for Interactive Video Generation

cs.CV · 2025-05-28

citing papers explorer

Showing 23 of 23 citing papers.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 11 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 42
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 27
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 60
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives cs.CV · 2026-05-12 · unverdicted · none · ref 2
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning cs.RO · 2026-02-23 · unverdicted · none · ref 12
PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models cs.CV · 2026-05-22 · unverdicted · none · ref 9
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching cs.CV · 2026-05-20 · unverdicted · none · ref 3
FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.
ReactiveGWM: Steering NPC in Reactive Game World Models cs.CV · 2026-05-14 · unverdicted · none · ref 5
ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 7 · 2 links
ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising cs.RO · 2026-04-29 · unverdicted · none · ref 15 · 2 links
X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation cs.CV · 2026-04-28 · unverdicted · none · ref 3
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
OpenGame: Open Agentic Coding for Games cs.SE · 2026-04-20 · unverdicted · none · ref 29
OpenGame is the first open-source agentic framework for end-to-end web game creation, using Game Skills and GameCoder-27B to achieve state-of-the-art results on 150 prompts via a new benchmark measuring build health, visual usability, and intent alignment.
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation cs.RO · 2026-03-16 · unverdicted · none · ref 3
SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 3
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 24
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration cs.CV · 2026-05-20 · unverdicted · none · ref 7
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 34
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
Nautilus: From One Prompt to Plug-and-Play Robot Learning cs.RO · 2026-05-12 · unverdicted · none · ref 69
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 29
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
Coding Agent Is Good As World Simulator cs.AI · 2026-05-14 · unreviewed · ref 4
Do Joint Audio-Video Generation Models Understand Physics? cs.SD · 2026-05-08 · unreviewed · ref 5
VRAG: Learning World Models for Interactive Video Generation cs.CV · 2025-05-28 · unreviewed · ref 9

Genie: Generative interactive environments

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer