FLAT maps compressed video diffusion latents to explicit triangle splats via ray-centered rotation parameterization and a product window function, reporting better geometric accuracy than 3D Gaussian baselines under identical training.
hub
Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024a
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 20verdicts
UNVERDICTED 20roles
background 3representative citing papers
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
Future Forcing constructs a future query proxy from historical pre-RoPE statistics to score and merge KV tokens, improving subject consistency by up to 1.49 on VBench-Long for 60s AR video generation.
LongAV-Compass is a new benchmark and evaluation framework for minute-scale audio-visual generation across T2AV, I2AV, and V2AV with multi-dimensional assessment.
MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation featuring four dimensions, challenging scenarios, and an adaptive hybrid evaluation framework that achieves 91.5% Spearman correlation with human judgments.
LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
ActWorld extends navigation-centric world models to support mid-rollout object interactions via chunk-autoregressive generation, action-aware memory routing, and a persistent memory bank, backed by a 100K annotated interaction dataset.
AnchorEdit is the first autoregressive diffusion framework for causal multi-turn image editing, achieving claimed SOTA consistency over 10+ rounds via three-stage training and a memory mechanism.
Introduces VideoWeaver benchmark (16 categories, 285 cases) plus agent-as-judge and skill-evolution algorithm to assess and improve agentic long video generation across frameworks.
RAVEN aligns training and inference for causal autoregressive video diffusion via interleaved rollout repacking and introduces CM-GRPO for direct RL on consistency-model kernels, claiming better quality than recent baselines.
Warp-as-History enables zero-shot camera trajectory following in frozen video models by supplying camera-warped pseudo-history, with single-video LoRA fine-tuning improving generalization to unseen videos.
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.
Forcing-KV applies head-specific static and dynamic pruning to KV caches in AR video diffusion models, achieving over 29 fps, 30% memory reduction, and up to 2.82x speedup at maintained quality.
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
MaineCoon is presented as the first 22B-parameter real-time streaming audio-visual autoregressive model optimized for social-interactive applications, using novel training techniques and an agentic inference framework.
Introduces CineDance-1M dataset for multi-shot long-form text-to-audio-video generation along with CineBench and a model adaptation.
X-Foresight adds a long-horizon chunk-wise auto-regressive world model with temporal importance sampling and curriculum learning to VLA architectures for improved planning and generative fidelity.
Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.
citing papers explorer
No citing papers match the current filters.