YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
arXiv preprint arXiv:2510.11512 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.
Proprio uses flow residuals from latent perturbations in frozen video generators as a self-scoring signal for physical plausibility, yielding reported gains of 16.5% on Physics-IQ and 20.6% on VideoPhy2-hard.
Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.
Assembles MPM simulation dataset and compares code generation versus video diffusion for inferring physical parameters and extrapolating dynamics from videos.
citing papers explorer
-
YoCausal: How Far is Video Generation from World Model? A Causality Perspective
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
-
GEOPHYS: The Geometry of Physical Plausibility
GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.
-
Proprio: Latent Self-Scoring and Inference-Time Refinement for Physically Plausible Video Generation
Proprio uses flow residuals from latent perturbations in frozen video generators as a self-scoring signal for physical plausibility, yielding reported gains of 16.5% on Physics-IQ and 20.6% on VideoPhy2-hard.