Title resolution pending

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Cha · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

baseline 2

citation-polarity summary

baseline 2

representative citing papers

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

BEDTime: A Unified Benchmark for Automatically Describing Time Series

cs.CL · 2025-09-05 · conditional · novelty 6.0

BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.

citing papers explorer

Showing 2 of 2 citing papers.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 1 · 2 links
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
BEDTime: A Unified Benchmark for Automatically Describing Time Series cs.CL · 2025-09-05 · conditional · none · ref 47
BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer