MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
Omni-worldbench: Towards a comprehensive interaction-centric evaluation for world models.arXiv preprint arXiv:2603.22212, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.
citing papers explorer
-
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.