ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
SplitGS-Loc disambiguates 2D-3D correspondences in photometrically optimized GSFFs via Mixture-of-Gaussians splitting and multi-view consistency selection, yielding stable PnP and SOTA localization results.
TriDE refines unreliable pairwise translation directions through message passing on weighted triangles in the viewing graph and provides a phase-transition bound for exact recovery under random corruption.
citing papers explorer
-
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.
-
Disambiguating 2D-3D Correspondences in Gaussian Splatting-based Feature Fields for Visual Localization
SplitGS-Loc disambiguates 2D-3D correspondences in photometrically optimized GSFFs via Mixture-of-Gaussians splitting and multi-view consistency selection, yielding stable PnP and SOTA localization results.
-
TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation
TriDE refines unreliable pairwise translation directions through message passing on weighted triangles in the viewing graph and provides a phase-transition bound for exact recovery under random corruption.