hub

Causal forcing: Autoregressive diffusion distillation done right for high-quality real- time interactive video generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu · 2026 · arXiv 2602.02214

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.

HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

cs.CV · 2026-05-12 · conditional · novelty 7.0

HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

Forcing-KV applies head-specific static and dynamic pruning to KV caches in AR video diffusion models, achieving over 29 fps, 30% memory reduction, and up to 2.82x speedup at maintained quality.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

A Systematic Post-Train Framework for Video Generation

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

citing papers explorer

Showing 12 of 12 citing papers.

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives cs.CV · 2026-05-12 · unverdicted · none · ref 62
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation cs.CV · 2026-05-12 · conditional · none · ref 32
HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
MultiWorld: Scalable Multi-Agent Multi-View Video World Models cs.CV · 2026-04-20 · unverdicted · none · ref 75
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 200
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 3
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models cs.CV · 2026-05-10 · unverdicted · none · ref 49
Forcing-KV applies head-specific static and dynamic pruning to KV caches in AR video diffusion models, achieving over 29 fps, 30% memory reduction, and up to 2.82x speedup at maintained quality.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 234
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 63
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation cs.CV · 2026-04-03 · unverdicted · none · ref 47
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 75
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
A Systematic Post-Train Framework for Video Generation cs.CV · 2026-04-28 · unverdicted · none · ref 39
A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 60
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

Causal forcing: Autoregressive diffusion distillation done right for high-quality real- time interactive video generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer