MapReason-OSM supplies 6000 graph-verifiable instances across 12 mobility tasks on rendered OSM maps from 10 U.S. downtowns and shows that seven VLMs succeed at simple routing but perform near chance on cost-based facility placement and cross-zoom consistency.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
GeoNatureAgent Benchmark tests seven LLMs on 93 tasks via a production geospatial API, with Claude Sonnet 4 at 60.8% and DeepSeek V3.2 offering near performance at 11x lower cost while all models fail on close-value comparisons.
SLM adds a dedicated spatial modality and training dataset to LLMs, enabling geometric spatial reasoning and outperforming prompt-based symbolic methods on the new SpatialEval benchmark.
citing papers explorer
-
MapReason-OSM: Can Vision-Language Models Make Graph-Verifiable Mobility Decisions from Street Maps ?
MapReason-OSM supplies 6000 graph-verifiable instances across 12 mobility tasks on rendered OSM maps from 10 U.S. downtowns and shows that seven VLMs succeed at simple routing but perform near chance on cost-based facility placement and cross-zoom consistency.
-
GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models
GeoNatureAgent Benchmark tests seven LLMs on 93 tasks via a production geospatial API, with Claude Sonnet 4 at 60.8% and DeepSeek V3.2 offering near performance at 11x lower cost while all models fail on close-value comparisons.
-
From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models
SLM adds a dedicated spatial modality and training dataset to LLMs, enabling geometric spatial reasoning and outperforming prompt-based symbolic methods on the new SpatialEval benchmark.