GeoAlign dynamically aggregates multi-layer geometric features via content-aware sparse routing to achieve state-of-the-art spatial reasoning in a compact 4B MLLM, outperforming larger models on VSI-Bench, ScanQA, and SQA3D.
InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 5828–5839
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning
GeoAlign dynamically aggregates multi-layer geometric features via content-aware sparse routing to achieve state-of-the-art spatial reasoning in a compact 4B MLLM, outperforming larger models on VSI-Bench, ScanQA, and SQA3D.