Introduces VidPair-Halluc benchmark of 1K background-controlled adversarial video pairs and 11K QA pairs generated via PairFlow pipeline to evaluate hallucination in LVMs.
Impossible videos
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5roles
background 2polarities
background 2representative citing papers
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
VideoASMR-Bench shows state-of-the-art VLMs fail to reliably detect AI-generated ASMR videos from real ones, though humans can still identify the fakes relatively easily.
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.
citing papers explorer
-
No Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs
Introduces VidPair-Halluc benchmark of 1K background-controlled adversarial video pairs and 11K QA pairs generated via PairFlow pipeline to evaluate hallucination in LVMs.
-
YoCausal: How Far is Video Generation from World Model? A Causality Perspective
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
-
VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?
VideoASMR-Bench shows state-of-the-art VLMs fail to reliably detect AI-generated ASMR videos from real ones, though humans can still identify the fakes relatively easily.