arXiv preprint arXiv:2601.10553 , year=

Jianhao Yuan, Xiaofeng Zhang, Felix Friedrich, Nicolas Beltran-Velez, Melissa Hall, Reyhane Askari- Hemmat, Xiaochuang Han, Nicolas Ballas, Michal Drozdzal, Adriana Romero-Soriano · 2026 · arXiv 2601.10553

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

cs.CV · 2026-06-04 · unverdicted · novelty 7.0

PhaseLock extracts motion priors from 2-step inference and enforces them via Latent Delta Guidance to raise physical consistency scores by 6.2 points on average in image-to-video diffusion models.

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.

GEOPHYS: The Geometry of Physical Plausibility

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.

Prisma-World: Camera-Controllable Multi-Agent Video World Model

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

A multi-agent video world model using simplex rotary agent encoding and sparse hub attention achieves better fidelity, controllability, and consistency than baselines while generalizing from 2 to 4 players.

Proprio: Latent Self-Scoring and Inference-Time Refinement for Physically Plausible Video Generation

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

Proprio uses flow residuals from latent perturbations in frozen video generators as a self-scoring signal for physical plausibility, yielding reported gains of 16.5% on Physics-IQ and 20.6% on VideoPhy2-hard.

Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Physics-IQ Verified

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

Physics-IQ Verified refines 57.6% of samples and 34.8% of prompts from the original benchmark and produces moderate ranking shifts (Kendall's τ = 0.46) across six image-to-video models.

Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment

cs.CV · 2026-06-03 · unverdicted · novelty 5.0

PILA aligns frozen flow-matching video models to a physics attribute bank via MoE experts and operational residuals, reporting SOTA physical plausibility on VBench-2.0, VideoPhy-2 and PhyGenBench while preserving visual quality.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

citing papers explorer

Showing 12 of 12 citing papers.

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them cs.CV · 2026-06-04 · unverdicted · none · ref 45
PhaseLock extracts motion priors from 2-step inference and enforces them via Latent Delta Guidance to raise physical consistency scores by 6.2 points on average in image-to-video diffusion models.
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning cs.CV · 2026-05-21 · unverdicted · none · ref 85
MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.
GEOPHYS: The Geometry of Physical Plausibility cs.CV · 2026-06-15 · unverdicted · none · ref 10
GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.
Prisma-World: Camera-Controllable Multi-Agent Video World Model cs.CV · 2026-06-08 · unverdicted · none · ref 41
Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement cs.CV · 2026-05-29 · unverdicted · none · ref 94
StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players cs.CV · 2026-05-27 · unverdicted · none · ref 65
A multi-agent video world model using simplex rotary agent encoding and sparse hub attention achieves better fidelity, controllability, and consistency than baselines while generalizing from 2 to 4 players.
Proprio: Latent Self-Scoring and Inference-Time Refinement for Physically Plausible Video Generation cs.CV · 2026-05-27 · unverdicted · none · ref 41
Proprio uses flow residuals from latent perturbations in frozen video generators as a self-scoring signal for physical plausibility, yielding reported gains of 16.5% on Physics-IQ and 20.6% on VideoPhy2-hard.
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 9
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 208
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Physics-IQ Verified cs.CV · 2026-06-17 · unverdicted · none · ref 22
Physics-IQ Verified refines 57.6% of samples and 34.8% of prompts from the original benchmark and produces moderate ranking shifts (Kendall's τ = 0.46) across six image-to-video models.
Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment cs.CV · 2026-06-03 · unverdicted · none · ref 12
PILA aligns frozen flow-matching video models to a physics attribute bank via MoE experts and operational residuals, reporting SOTA physical plausibility on VBench-2.0, VideoPhy-2 and PhyGenBench while preserving visual quality.
PhyWorld: Physics-Faithful World Model for Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 23
PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

arXiv preprint arXiv:2601.10553 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer