hub

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang · 2025 · arXiv 2503.21755

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open full Pith review browse 20 citing papers arXiv PDF

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 dataset 1

citation-polarity summary

background 2

representative citing papers

PhysInOne: Visual Physics Learning and Reasoning in One Suite

cs.CV · 2026-04-10 · unverdicted · novelty 8.0

PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.

PhyGround: Benchmarking Physical Reasoning in Generative World Models

cs.CV · 2026-05-11 · accept · novelty 7.0

PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.

WorldJen: An End-to-End Multi-Dimensional Benchmark for Generative Video Models

cs.CV · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

WorldJen is a new benchmark for generative video models that uses VLM-judged multi-dimensional Likert questionnaires validated against human preferences to achieve perfect tier agreement.

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Hallo-Live achieves 20.38 FPS real-time text-to-audio-video avatar generation with 0.94s latency using asynchronous dual-stream diffusion and HP-DMD preference distillation, matching teacher model quality at 16x higher throughput.

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

cs.RO · 2026-04-21 · unverdicted · novelty 7.0

RoboWM-Bench evaluates video world models by converting their outputs into executable robot actions and running them on manipulation tasks, showing that physical inconsistencies remain common.

AnimationBench: Are Video Models Good at Character-Centric Animation?

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

AnimationBench is the first benchmark that operationalizes the twelve basic principles of animation and IP preservation into scalable, VLM-assisted metrics for animation-style I2V generation.

Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

Grounded Forcing introduces dual memory caching, reference-based positional embeddings, and proximity-weighted recaching to bridge stable semantics with local dynamics, improving long-range consistency in autoregressive video synthesis.

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.

SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

SARA improves text alignment and motion quality in video diffusion models by routing token-relation distillation supervision to semantically salient pairs using a Stage-1 aligner trained with SAM masks and InfoNCE.

HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

HuM-Eval evaluates human motion videos with a coarse-to-fine approach using VLM global checks plus 2D pose and 3D motion analysis, reaching 58.2% average correlation with human judgments and introducing a 1000-prompt benchmark.

Seeing Fast and Slow: Learning the Flow of Time in Videos

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Self-supervised models learn to perceive and manipulate the flow of time in videos, supporting speed detection, large-scale slow-motion data curation, and temporally controllable video synthesis.

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.

VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation

cs.CV · 2026-04-11 · unverdicted · novelty 6.0

VGA-Bench creates a three-tier taxonomy, 1,016-prompt dataset of 60k+ videos, and three multi-task neural models (VAQA-Net, VTag-Net, VGQA-Net) that align with human judgments for video aesthetics and generation quality.

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

ImVideoEdit learns video editing from 13K image pairs by decoupling spatial modifications from frozen temporal dynamics in pretrained models, matching larger video-trained systems in fidelity and consistency.

Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

Diffusion-APO synchronizes training noise with inference trajectories in video diffusion models to improve preference alignment and visual quality.

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

cs.CV · 2026-05-06 · conditional · novelty 5.0

The PhyScore challenge creates the first benchmark requiring metrics to jointly score video quality, physical realism, condition alignment, and temporal consistency while localizing physical anomalies in 1554 videos from seven generative models across text-to-2D, image-to-4D, and video-to-4D tracks.

Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

Phantom generates visually realistic and physically consistent videos by jointly modeling visual content and latent physical dynamics via an abstract physics-aware representation.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

cs.CV · 2026-05-04 · unverdicted · novelty 4.0

Mamoda2.5 is a 25B-parameter DiT-MoE unified AR-Diffusion model that reaches top video generation and editing benchmarks with 4-step inference up to 95.9x faster than baselines.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 2 of 2 citing papers after filters.

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation cs.RO · 2026-04-21 · unverdicted · none · ref 60 · internal anchor
RoboWM-Bench evaluates video world models by converting their outputs into executable robot actions and running them on manipulation tasks, showing that physical inconsistencies remain common.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 214 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer