hub

Sti-bench: Are mllms ready for precise spatial-temporal world understanding?

Yun Li, Yiming Zhang, Tao Lin, Xiangrui Liu, Wenxiao Cai, Zheng Liu, Bo Zhao · 2025 · arXiv 2503.23765

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.

CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Proposes Spatial Narrative Score (SNS) evaluation for VLMs' camera motion understanding and introduces CaMo model achieving consistent performance on SNS and direct QA.

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.

SCP: Spatial Causal Prediction in Video

cs.CV · 2026-03-04 · unverdicted · novelty 7.0

SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

cs.AI · 2025-11-26 · unverdicted · novelty 7.0

SpatialBench creates a five-level framework and 15-task benchmark to measure hierarchical spatial reasoning in MLLMs, finding strong basic perception but weak symbolic reasoning, causal inference, and planning.

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

cs.CV · 2025-04-02 · unverdicted · novelty 7.0

SpaceR uses a new verifiable dataset and map-imagination-augmented RLVR to reach SOTA spatial reasoning accuracy in MLLMs, exceeding GPT-4o on VSI-Bench.

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

GASP injects geometric priors into VLMs via a deep-supervised correspondence head trained on video point correspondences and depth consistency, raising internal matching accuracy and delivering gains on spatial benchmarks without any 3D VQA data.

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

Q-GeoMem uses question-guided scoring to maintain a Fine-Grained Context Bank and Semantic-Geometric Evidence Bank, achieving SOTA on VSI-Bench and VSTI-Bench.

Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Flat-Pack Bench is a new evaluation suite that shows state-of-the-art LVLMs perform poorly on nuanced spatio-temporal reasoning required for furniture assembly videos.

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

cs.CV · 2026-03-28 · unverdicted · novelty 6.0

SpatialStack improves 3D spatial reasoning in vision-language models by stacking and synchronizing multi-level geometric features with the language backbone.

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

cs.CV · 2025-12-11 · conditional · novelty 6.0

SpaceDrive integrates 3D positional encodings derived from depth and ego-states into VLMs, replacing digit tokens to improve spatial reasoning and trajectory regression in autonomous driving.

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

cs.CV · 2025-05-29 · unverdicted · novelty 6.0 · 2 refs

Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition cs.AI · 2025-11-26 · unverdicted · none · ref 41
SpatialBench creates a five-level framework and 15-task benchmark to measure hierarchical spatial reasoning in MLLMs, finding strong basic perception but weak symbolic reasoning, causal inference, and planning.

Sti-bench: Are mllms ready for precise spatial-temporal world understanding?

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer