MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
A survey for founda- tion models in autonomous driving
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4roles
background 1polarities
background 1representative citing papers
SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.
SoccerMaster is the first soccer-specific vision foundation model that unifies tasks from player detection to event classification via multi-task pretraining and outperforms task-specific models on downstream evaluations.
Framework learns latent scene embeddings from 24 trajectory datasets to produce transferability scores that correlate with cross-dataset model performance.
citing papers explorer
-
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.