MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Ziqiao Shang , Lingyue Ge , Yang Chen , Shi-Yu Tian , Zhenyu Huang , Wenbo Fu , Yu-Feng Li , Lan-Zhe Guo

Authors on Pith no claims yet

classification 💻 cs.LG

keywords maptabmllmsmulti-criteriamultimodalrouteplanningreasoningacross

read the original abstract

Systematic evaluation of Multimodal Large Language Models (MLLMs) is crucial for advancing Artificial General Intelligence (AGI). However, existing benchmarks remain insufficient for rigorously assessing their reasoning capabilities under multi-criteria constraints. To bridge this gap, we introduce MapTab, a multimodal benchmark specifically designed to evaluate holistic multi-criteria reasoning in MLLMs via route planning tasks. MapTab requires MLLMs to perceive and ground visual cues from map images alongside route attributes (e.g., Time, Price) from structured tabular data. The benchmark encompasses two scenarios: Metromap, covering metro networks in 160 cities across 52 countries, and Travelmap, depicting 168 representative tourist attractions from 19 countries. In total, MapTab comprises 328 images, 196,800 route planning queries, and 3,936 QA queries, all incorporating 4 key criteria: Time, Price, Comfort, and Reliability. Extensive evaluations across 15 representative MLLMs reveal that current models face substantial challenges in multi-criteria multimodal reasoning. Notably, under conditions of limited visual perception, multimodal collaboration often underperforms compared to unimodal approaches. We believe MapTab provides a challenging and realistic testbed to advance the systematic evaluation of MLLMs. Our code is available at https://github.com/Ziqiao-Shang/MapTab.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 6.0

LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.