Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
Physgame: Uncovering physical commonsense violations in gameplay videos
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
RESP uses reference-guided sequential prompting with VLMs to improve frame-level and video-level visual glitch detection in games by establishing per-video baselines.
Introduces the first benchmark for open-ended video game glitch detection with temporal localization and proposes GliDe, an agentic framework that achieves stronger performance than vanilla multimodal models.
TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.
citing papers explorer
-
Do generative video models understand physical principles?
Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
-
PhyGround: Benchmarking Physical Reasoning in Generative World Models
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
-
RESP: Reference-guided Sequential Prompting for Visual Glitch Detection in Video Games
RESP uses reference-guided sequential prompting with VLMs to improve frame-level and video-level visual glitch detection in games by establishing per-video baselines.
-
Open-Ended Video Game Glitch Detection with Agentic Reasoning and Temporal Grounding
Introduces the first benchmark for open-ended video game glitch detection with temporal localization and proposes GliDe, an agentic framework that achieves stronger performance than vanilla multimodal models.
-
TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos
TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.