pith. machine review for the scientific record. sign in

hub

Step-video-t2v tech- nical report: The practice, challenges, and future of video foundation model

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 11

years

2026 9 2025 2

roles

background 2

polarities

background 2

representative citing papers

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Qwen-Image-VAE-2.0 Technical Report

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.

MAGI-1: Autoregressive Video Generation at Scale

cs.CV · 2025-05-19 · unverdicted · novelty 6.0

MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.

Motif-Video 2B: Technical Report

cs.CV · 2026-04-14 · unverdicted · novelty 5.0

Motif-Video 2B achieves 83.76% VBench score, beating a 14B-parameter baseline with 7x fewer parameters and substantially less training data through shared cross-attention and a three-part backbone.

Qwen-Image-2.0 Technical Report

cs.CV · 2026-05-11 · unverdicted · novelty 4.0

Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 11 of 11 citing papers.

  • HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation cs.CV · 2026-05-12 · conditional · none · ref 14

    HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.

  • Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 29

    PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

  • Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 100

    A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

  • Qwen-Image-VAE-2.0 Technical Report cs.CV · 2026-05-13 · unverdicted · none · ref 12

    Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.

  • Leveraging Verifier-Based Reinforcement Learning in Image Editing cs.CV · 2026-04-30 · unverdicted · none · ref 38

    Edit-R1 trains a CoT-based reasoning reward model with GCPO and uses it to boost image editing performance over VLMs and models like FLUX.1-kontext via GRPO.

  • DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior cs.CV · 2026-04-19 · unverdicted · none · ref 23

    DreamShot uses video diffusion priors and a role-attention consistency loss to produce coherent, personalized storyboards with better character and scene continuity than text-to-image methods.

  • MAGI-1: Autoregressive Video Generation at Scale cs.CV · 2025-05-19 · unverdicted · none · ref 30

    MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.

  • VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness cs.CV · 2025-03-27 · accept · none · ref 62

    VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.

  • Motif-Video 2B: Technical Report cs.CV · 2026-04-14 · unverdicted · none · ref 24

    Motif-Video 2B achieves 83.76% VBench score, beating a 14B-parameter baseline with 7x fewer parameters and substantially less training data through shared cross-attention and a three-part backbone.

  • Qwen-Image-2.0 Technical Report cs.CV · 2026-05-11 · unverdicted · none · ref 19

    Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

  • Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 82

    This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.