hub Canonical reference

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu · 2026 · cs.CV · arXiv 2602.02214

Canonical reference. 78% of citing Pith papers cite this work as background.

50 Pith papers citing it

Background 78% of classified citations

open full Pith review browse 50 citing papers arXiv PDF

abstract

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing, which uses an autoregressive teacher for ODE initialization to bridge the architectural gap, and then applies the same DMD procedure as in Self Forcing. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3\% in Dynamic Degree, 8.7\% in VisionReward, and 16.7\% in Instruction Following. Project page: \href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}; the code: \href{https://github.com/thu-ml/Causal-Forcing}{https://github.com/thu-ml/Causal-Forcing}.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 1

citation-polarity summary

background 7 unclear 1 use method 1

representative citing papers

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

cs.CV · 2026-05-28 · unverdicted · novelty 8.0

VideoMLA applies multi-head latent attention with 3D-RoPE decoupling to autoregressive video diffusion, delivering 92.7% KV memory reduction while matching short-horizon baselines and leading long-horizon VBench scores.

Towards Memory-Efficient Autoregressive Video Generation via Instance-Specific Parametric Absorption

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

ISPA reduces KV cache size by up to 50% in AR video models by transitioning layers to local attention and applying instance-specific least-squares weight modulation to compensate for lost history.

TempAct: Advancing Temporal Plausibility in Autoregressive Video Generation via Planner-Executor RL

cs.CV · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

TempAct introduces a planner-executor RL framework with hierarchical group exploration and rewards to improve temporal consistency in autoregressive video diffusion models.

TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

TetherCache organizes KV-cache into sink, memory, and recent regions and applies gated recall with attention-diversity balancing plus trusted memory editing to stabilize long-horizon autoregressive video diffusion.

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

LongLive-RAG formulates long video generation as retrieval-augmented generation by treating self-generated latents as a dynamic searchable history and adding a Window Temporal Delta Loss for better retrieval.

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.

AdaState: Self-Evolving Anchors for Streaming Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.

Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

Future Forcing constructs a future query proxy from historical pre-RoPE statistics to score and merge KV tokens, improving subject consistency by up to 1.49 on VBench-Long for 60s AR video generation.

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

DySink maintains a memory bank and retrieves relevant historical frames as dynamic sinks while using an anomaly gate to suppress collapse, yielding higher temporal quality and dynamic degree on minute-long videos.

Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Anchored Tree Sampling converts horizon-compounding drift into anchor-bounded drift by organizing video generation as a sparse-to-dense tree of imputations instead of left-to-right autoregressive rollout.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

LiveEdit distills a bidirectional video foundation model into a unidirectional streaming editor via three-stage training plus mask caching to reach 12.66 FPS with stable edits.

ZeroGVC: Zero-Shot Generative Video Compression with Autoregressive Diffusion Priors

eess.IV · 2026-06-21 · unverdicted · novelty 6.0

ZeroGVC performs zero-shot generative video compression by guiding pretrained autoregressive diffusion priors with codebook noise vectors for P-frames after encoding the initial I-frame.

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

ActWorld extends navigation-centric world models to support mid-rollout object interactions via chunk-autoregressive generation, action-aware memory routing, and a persistent memory bank, backed by a 100K annotated interaction dataset.

AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

AnchorEdit is the first autoregressive diffusion framework for causal multi-turn image editing, achieving claimed SOTA consistency over 10+ rounds via three-stage training and a memory mechanism.

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

K-Forcing introduces progressive self-forcing distillation to train a conditional push-forward model that jointly decodes k future tokens per forward pass, yielding 2.4-3.5x speedup at k=4 with modest quality loss on LM1B and OpenWebText.

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

cs.MM · 2026-06-03 · unverdicted · novelty 6.0

Echo-Infinity replaces handcrafted KV-cache schedules with end-to-end optimized Memory Queries and a Unified Relative RoPE recipe to support real-time infinite video generation in diffusion transformers.

Video-Mirai: Autoregressive Video Diffusion Models Need Foresight

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

Training method distills non-causal future targets into causal video diffusion states to boost long-horizon consistency without changing inference architecture or cost.

citing papers explorer

Showing 47 of 47 citing papers after filters.

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion cs.CV · 2026-05-28 · unverdicted · none · ref 31 · internal anchor
VideoMLA applies multi-head latent attention with 3D-RoPE decoupling to autoregressive video diffusion, delivering 92.7% KV memory reduction while matching short-horizon baselines and leading long-horizon VBench scores.
Towards Memory-Efficient Autoregressive Video Generation via Instance-Specific Parametric Absorption cs.CV · 2026-07-01 · unverdicted · none · ref 56 · internal anchor
ISPA reduces KV cache size by up to 50% in AR video models by transitioning layers to local attention and applying instance-specific least-squares weight modulation to compensate for lost history.
TempAct: Advancing Temporal Plausibility in Autoregressive Video Generation via Planner-Executor RL cs.CV · 2026-06-26 · unverdicted · none · ref 28 · 2 links · internal anchor
TempAct introduces a planner-executor RL framework with hierarchical group exploration and rewards to improve temporal consistency in autoregressive video diffusion models.
TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment cs.CV · 2026-06-11 · unverdicted · none · ref 21 · internal anchor
TetherCache organizes KV-cache into sink, memory, and recent regions and applies gated recall with attention-diversity balancing plus trusted memory editing to stabilize long-horizon autoregressive video diffusion.
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation cs.CV · 2026-06-01 · unverdicted · none · ref 67 · internal anchor
LongLive-RAG formulates long video generation as retrieval-augmented generation by treating self-generated latents as a dynamic searchable history and adding a Window Temporal Delta Loss for better retrieval.
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models cs.CV · 2026-05-30 · unverdicted · none · ref 100 · internal anchor
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
AdaState: Self-Evolving Anchors for Streaming Video Generation cs.CV · 2026-05-28 · unverdicted · none · ref 35 · internal anchor
AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.
Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation cs.CV · 2026-05-28 · unverdicted · none · ref 61 · internal anchor
Future Forcing constructs a future query proxy from historical pre-RoPE statistics to score and merge KV tokens, improving subject consistency by up to 1.49 on VBench-Long for 60s AR video generation.
Q-ARVD: Quantizing Autoregressive Video Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 26 · internal anchor
Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation cs.CV · 2026-05-20 · unverdicted · none · ref 22 · 2 links · internal anchor
DySink maintains a memory bank and retrieves relevant historical frames as dynamic sinks while using an anomaly gate to suppress collapse, yielding higher temporal quality and dynamic degree on minute-long videos.
Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 9 · internal anchor
Anchored Tree Sampling converts horizon-compounding drift into anchor-bounded drift by organizing video generation as a sparse-to-dense tree of imputations instead of left-to-right autoregressive rollout.
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 82 · internal anchor
LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives cs.CV · 2026-05-12 · unverdicted · none · ref 62 · internal anchor
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
MultiWorld: Scalable Multi-Agent Multi-View Video World Models cs.CV · 2026-04-20 · unverdicted · none · ref 75 · internal anchor
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 200 · internal anchor
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control cs.CV · 2026-07-02 · unverdicted · none · ref 71 · internal anchor
HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.
ABot-M0.5: Unified Mobility-and-Manipulation World Action Model cs.CV · 2026-07-01 · unverdicted · none · ref 81 · internal anchor
ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.
LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing cs.CV · 2026-06-25 · unverdicted · none · ref 75 · internal anchor
LiveEdit distills a bidirectional video foundation model into a unidirectional streaming editor via three-stage training plus mask caching to reach 12.66 FPS with stable edits.
ZeroGVC: Zero-Shot Generative Video Compression with Autoregressive Diffusion Priors eess.IV · 2026-06-21 · unverdicted · none · ref 49 · internal anchor
ZeroGVC performs zero-shot generative video compression by guiding pretrained autoregressive diffusion priors with codebook noise vectors for P-frames after encoding the initial I-frame.
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory cs.CV · 2026-06-16 · unverdicted · none · ref 50 · internal anchor
ActWorld extends navigation-centric world models to support mid-rollout object interactions via chunk-autoregressive generation, action-aware memory routing, and a persistent memory bank, backed by a 100K annotated interaction dataset.
AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory cs.CV · 2026-06-10 · unverdicted · none · ref 16 · internal anchor
AnchorEdit is the first autoregressive diffusion framework for causal multi-turn image editing, achieving claimed SOTA consistency over 10+ rounds via three-stage training and a memory mechanism.
K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling cs.LG · 2026-06-09 · unverdicted · none · ref 54 · internal anchor
K-Forcing introduces progressive self-forcing distillation to train a conditional push-forward model that jointly decodes k future tokens per forward pass, yielding 2.4-3.5x speedup at k=4 with modest quality loss on LM1B and OpenWebText.
Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation cs.MM · 2026-06-03 · unverdicted · none · ref 50 · internal anchor
Echo-Infinity replaces handcrafted KV-cache schedules with end-to-end optimized Memory Queries and a Unified Relative RoPE recipe to support real-time infinite video generation in diffusion transformers.
Video-Mirai: Autoregressive Video Diffusion Models Need Foresight cs.CV · 2026-06-02 · unverdicted · none · ref 41 · internal anchor
Training method distills non-causal future targets into causal video diffusion states to boost long-horizon consistency without changing inference architecture or cost.
PointAction: 3D Points as Universal Action Representations for Robot Control cs.RO · 2026-06-02 · unverdicted · none · ref 72 · internal anchor
PointAction uses predicted dynamic 3D pointmaps from fine-tuned video models as an embodiment-agnostic action representation to map video predictions to executable robot actions.
LiveBand: Live Accompaniment Generation in the Audio Domain cs.SD · 2026-06-02 · unverdicted · none · ref 39 · internal anchor
LiveBand generates high-fidelity music accompaniments to live audio in real time via a causal transformer in audio latent space trained with adversarial sequence-level supervision.
Robust Dreamer: Deviation-Aware Latent Gaussian Memory for Action-Controlled AR Video Generation cs.CV · 2026-05-29 · unverdicted · none · ref 81 · internal anchor
Robust Dreamer uses Latent Gaussian Memory anchored to diffusion latents and Deviation Learning with a Dynamic Deviation Archive to reduce drift in long-horizon action-controlled image-to-video generation, reporting SOTA results on ScanNet, DL3DV, and OmniWorldGame.
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer cs.CV · 2026-05-28 · unverdicted · none · ref 44 · internal anchor
SANA-Streaming delivers 1280x704 streaming video editing at 24 FPS end-to-end on an RTX 5090 using hybrid DiT blocks, cycle-reverse training, and mixed-precision quantization.
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models cs.CV · 2026-05-28 · unverdicted · none · ref 23 · internal anchor
minWM supplies an end-to-end pipeline that fine-tunes bidirectional T2V/TI2V models with camera control then distills them via Causal Forcing into few-step autoregressive generators for low-latency rollout.
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models cs.CV · 2026-05-22 · unverdicted · none · ref 70 · internal anchor
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
WorldKV: Efficient World Memory with World Retrieval and Compression cs.CV · 2026-05-21 · unverdicted · none · ref 36 · internal anchor
WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.
FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization cs.CV · 2026-05-15 · unverdicted · none · ref 29 · 2 links · internal anchor
FashionChameleon achieves interactive multi-garment video customization at 23.8 FPS via in-context teacher models, streaming distillation, and training-free KV cache rescheduling while using only single-garment data.
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation cs.CV · 2026-05-12 · unverdicted · none · ref 32 · 2 links · internal anchor
HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models cs.CV · 2026-05-10 · unverdicted · none · ref 49 · internal anchor
Forcing-KV applies head-specific static and dynamic pruning to KV caches in AR video diffusion models, achieving over 29 fps, 30% memory reduction, and up to 2.82x speedup at maintained quality.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 234 · internal anchor
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 75 · internal anchor
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model cs.CV · 2026-06-16 · unverdicted · none · ref 69 · internal anchor
MaineCoon is presented as the first 22B-parameter real-time streaming audio-visual autoregressive model optimized for social-interactive applications, using novel training techniques and an agentic inference framework.
WorldOlympiad: Can Your World Model Survive a Triathlon? cs.CV · 2026-06-09 · unverdicted · none · ref 55 · internal anchor
WorldOlympiad is a new benchmark decomposing world-model evaluation into physical, geometry, and interaction tracks using segmentation, MLLM judges, Gaussian splatting, and action prompts on diverse scenarios.
Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions cs.CV · 2026-06-08 · unverdicted · none · ref 10 · internal anchor
Ultra Flash introduces a cascaded streaming super-resolution framework with specialized training, upsampling, and optimization to enable real-time high-resolution video generation from low-res diffusion models.
One-Forcing: Towards Stable One-Step Autoregressive Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 10 · internal anchor
One-Forcing augments DMD with a GAN loss to enable stable one-step causal autoregressive video generation, reporting a VBench score of 83.76 as SOTA among one-step methods.
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems cs.CV · 2026-05-21 · unverdicted · none · ref 58 · internal anchor
A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.
Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion cs.CV · 2026-05-18 · unverdicted · none · ref 60 · internal anchor
Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.
Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving cs.CV · 2026-05-18 · unverdicted · none · ref 17 · 2 links · internal anchor
A unified system integrating sparse-query 3D Gaussian reconstruction with multi-stage causal video generation for autonomous driving world models.
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation cs.CV · 2026-05-14 · unverdicted · none · ref 20 · internal anchor
Causal Forcing++ applies causal consistency distillation to enable scalable frame-wise 1-2 step autoregressive video generation, outperforming prior 4-step chunk-wise methods on quality metrics while halving first-frame latency.
A Systematic Post-Train Framework for Video Generation cs.CV · 2026-04-28 · unverdicted · none · ref 39 · internal anchor
A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 60 · internal anchor
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer