pith. sign in

hub Canonical reference

MAGI-1: Autoregressive Video Generation at Scale

Canonical reference. 74% of citing Pith papers cite this work as background.

76 Pith papers citing it
Background 74% of classified citations
abstract

We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.

hub tools

citation-role summary

background 15 baseline 2 dataset 1 method 1

citation-polarity summary

years

2026 69 2025 7

clear filters

representative citing papers

PhysInOne: Visual Physics Learning and Reasoning in One Suite

cs.CV · 2026-04-10 · unverdicted · novelty 8.0

PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

DySink maintains a memory bank and retrieves relevant historical frames as dynamic sinks while using an anomaly gate to suppress collapse, yielding higher temporal quality and dynamic degree on minute-long videos.

Envisioning the Future, One Step at a Time

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

An autoregressive diffusion model on sparse point trajectories predicts multi-modal future scene dynamics from single images with orders-of-magnitude faster sampling than dense video simulators while matching accuracy.

DSA: Dynamic Step Allocation for Fast Autoregressive Video Generation

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

DSA adds a jointly trained confidence head to autoregressive video diffusion models that dynamically allocates fewer or more denoising steps per frame, achieving 22.63 FPS real-time generation on H100 while matching VBench quality.

citing papers explorer

Showing 7 of 7 citing papers after filters.

  • Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation cs.CV · 2025-12-04 · conditional · none · ref 69 · internal anchor

    Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.

  • Generative View Stitching cs.CV · 2025-10-28 · unverdicted · none · ref 8 · internal anchor

    Generative View Stitching samples full video sequences in parallel using off-the-shelf Diffusion Forcing models plus Omni Guidance to produce stable, collision-free, loop-closing camera-guided videos.

  • Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 55 · internal anchor

    Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.

  • Rolling Forcing: Autoregressive Long Video Diffusion in Real Time cs.CV · 2025-09-29 · unverdicted · none · ref 93 · internal anchor

    Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.

  • LongLive: Real-time Interactive Long Video Generation cs.CV · 2025-09-26 · conditional · none · ref 32 · internal anchor

    LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.

  • Matrix-game 2.0: An open-source real-time and streaming interactive world model cs.CV · 2025-08-18 · unverdicted · none · ref 40 · internal anchor

    Matrix-Game 2.0 introduces a scalable data pipeline, action-injection module, and few-step distillation to enable real-time streaming video generation at 25 FPS from game-engine interactions, with open-sourced weights and code.

  • Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation cs.CV · 2025-11-25 · unverdicted · none · ref 33 · internal anchor

    Inferix provides an optimized inference engine for semi-autoregressive block-diffusion decoding to support high-quality, variable-length video generation in world simulation applications.