pith. sign in

arXiv preprint arXiv:2601.10553 , year=

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 11 cs.RO 1

years

2026 12

verdicts

UNVERDICTED 12

roles

background 1

polarities

background 1

clear filters

representative citing papers

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.

GEOPHYS: The Geometry of Physical Plausibility

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.

Prisma-World: Camera-Controllable Multi-Agent Video World Model

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Physics-IQ Verified

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

Physics-IQ Verified refines 57.6% of samples and 34.8% of prompts from the original benchmark and produces moderate ranking shifts (Kendall's τ = 0.46) across six image-to-video models.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

citing papers explorer

Showing 12 of 12 citing papers.

  • Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them cs.CV · 2026-06-04 · unverdicted · none · ref 45

    PhaseLock extracts motion priors from 2-step inference and enforces them via Latent Delta Guidance to raise physical consistency scores by 6.2 points on average in image-to-video diffusion models.

  • MotiMotion: Motion-Controlled Video Generation with Visual Reasoning cs.CV · 2026-05-21 · unverdicted · none · ref 85

    MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.

  • GEOPHYS: The Geometry of Physical Plausibility cs.CV · 2026-06-15 · unverdicted · none · ref 10

    GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.

  • Prisma-World: Camera-Controllable Multi-Agent Video World Model cs.CV · 2026-06-08 · unverdicted · none · ref 41

    Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.

  • StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement cs.CV · 2026-05-29 · unverdicted · none · ref 94

    StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.

  • Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players cs.CV · 2026-05-27 · unverdicted · none · ref 65

    A multi-agent video world model using simplex rotary agent encoding and sparse hub attention achieves better fidelity, controllability, and consistency than baselines while generalizing from 2 to 4 players.

  • Proprio: Latent Self-Scoring and Inference-Time Refinement for Physically Plausible Video Generation cs.CV · 2026-05-27 · unverdicted · none · ref 41

    Proprio uses flow residuals from latent perturbations in frozen video generators as a self-scoring signal for physical plausibility, yielding reported gains of 16.5% on Physics-IQ and 20.6% on VideoPhy2-hard.

  • Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 9

    Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.

  • Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 208

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

  • Physics-IQ Verified cs.CV · 2026-06-17 · unverdicted · none · ref 22

    Physics-IQ Verified refines 57.6% of samples and 34.8% of prompts from the original benchmark and produces moderate ranking shifts (Kendall's τ = 0.46) across six image-to-video models.

  • Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment cs.CV · 2026-06-03 · unverdicted · none · ref 12

    PILA aligns frozen flow-matching video models to a physics attribute bank via MoE experts and operational residuals, reporting SOTA physical plausibility on VBench-2.0, VideoPhy-2 and PhyGenBench while preserving visual quality.

  • PhyWorld: Physics-Faithful World Model for Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 23

    PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.